• Save
Big Data mit Microsoft?
Upcoming SlideShare
Loading in...5
×
 

Big Data mit Microsoft?

on

  • 624 views

Wie HDInsight (Hadoop auf Windows Azure), SQL Server 2014 und Excel zusammenspielen ...

Wie HDInsight (Hadoop auf Windows Azure), SQL Server 2014 und Excel zusammenspielen

Big Data ist eines der großen Buzzwords der IT-Welt, und doch für viele noch Neuland. In diesem Vortrag diskutieren wir, was Big Data überhaupt bedeutet, und schildern die Rolle der Microsoft Technologien in einem Bereich, der weit mehr als Open Source und Hadoop ist. Hierbei zeigen wir anhand konkreter Szenarien, wie HDInsight, SQL Server 2014 und Excel zusammenspielen, um typische Big Data-Aufgabenstellungen zu lösen!

Statistics

Views

Total Views
624
Views on SlideShare
622
Embed Views
2

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Olivia

Big Data mit Microsoft? Big Data mit Microsoft? Presentation Transcript

  • Big Data mit Microsoft? Wie HDInsight, SQL Server 2014 und Excel zusammenspielen Olivia Klose, Technical Evangelist Georg Urban, Sr. Technology Solution Professional Microsoft Deutschland GmbH
  • The large hadron collider produces 15 PB/year* http://public.web.cern.ch/public/en/lhc/Computing-en.html
  • But what if I don‟t own a large hadron collider …
  •  Large scale plants  Vehicle fleets  Smart Grids  Green Energy  Stock Exchanges  Host Protocols  Computer Centers  Web Farms  Twitter  Facebook  Google Analytics  …
  • XML – but…  polystructured  varying  no explicit schema  lot„s of hex-BLOBs 40.000 attributes & growing „here is my data“ </meldungText><antwort>False</antwort><wert>na</wert></meldung><steuergeraet sgbdVariante="SMG_60"><steuergeraeteFunktion zeitstempel="2013-04-30T09:00:37.9926171-04:00 endDate="2013-04-30T09:00:38.1158609-04:00" jobName="STATUS_FAHRZEUGTESTER"><datensatz satzNr="1"><result name="JOB_STATUS">OKAY</result><result name="_TEL_ANTWORT">80 F1 18 70 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 82 6B 00 6D 6B 39 CD 14 00 14 00 00 0E 00 15 00 00 19 00 0C 00 12 00 15 85 57 71 88 81 C0 7D 73 C2 08 01 05 02 F7 00 FF FF 01 73 00 00 02 A8 00 C2 01 E0 00 00 00 00 00 00 3D 01 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 01 E1 02 05 01 F8 03 4F FF AD 04</result><result name="_TEL_AUFTRAG">83 18 F1 30 02 01</result><result name="STAT_KL15_ROH">0</result><result name="STAT_KLR_EIN_ROH">0</result><result name="STAT_WAKE_UP_ROH">1</result><result name="STAT_ISTGANG_TEXT">Neutral</result> <sgFunktion zeitstempel=“2013-04-30T10:33:37.0834084+02:00" endDate="2013-0430T10:33:37.9310504+02:00" jobName="_FLM_LESEN_BOSCH"><datensatz satzNr="1"><result name="FLM_DATEN_1">00 00 00 03 02 08 C6 56 46 4C 4D 39 00 16 4B B2 00 00 00 32 00 00 06 99 00 00 65 00 00 18 6E 00 00 00 73 00 00 00 20 00 00 00 73 00 00 00 00 00 00 10 69 00 00 0F 53 00 00 00 00 00 00 0A 00 00 79 6D 00 00 B7 34 00 00 D3 9E 4A 4C 41 52 00 00 00 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 2C 00 00 00 00 00 00 1A 5C 00 15 4B CA 00 00 44 08 00 00 2D 39 00 00 1E 45 00 00 2 00 00 1E EB 00 00 0C 65 00 00 04 47 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 27 00 00 01 1E 00 02 AB 00 00 07 71 00 00 13 D7 00 00 36 48 00 15 91 AD 00 00 3F 97 00 00 19 C1 00 00 07 F9 00 00 02 00 00 00 BD 00 00 00 20 00 16 1C 42 00 00 18 B1 00 00 09 40 00 00 08 9F 00 00 04 3A 00 00 01 3E 00 8C D7 00 00 61 A3 00 00 37 9D 00 00 1E 78 00 00 14 96 00 00 0A 71 00 00 05 49 00 00 02 B1 00 00 0 00 00 00 1D 00 00 00 09 00 00 00 05 00 00 00 00 00 00 00 00 00 00 23 BB 00 00 2F 84 00 00 14 EF 00 09 40 00 00 04 71 00 00 03 34 00 00 02 12 00 00 01 AC 00 00 01 59 00 00 0B C4 00 00 00 06 00 00 00 00 00 00 19 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 52 4F 54 48 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 01 00 00 00 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 56 30 00 00 00 03 00 11 00 01 01 06 00 00 00 00 00 00 00 00 01 00 00 00 0E 00 05 00 1A 00 12 00 00 00 26 00 00 00 00 00 0B 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 44 00 43 00 16 00 08 00 00 04 00 02 00 00 00 02 00 11 00 20 00 1A 00 0A 00 15 00 0F 00 1B 00 13 00 08 00 08 00 00 00 00 00 00 0E 00 08 00 04 00 02 00 01 00 00 00 6D 00 03 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0A 00 21 00 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 05 1F 00 00 00 00 00 00 00 00 00 1F 00 03 00 02 00 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2E 00 00 1B 00 19 00 18 00 0D 00 00 00 00 00 00 00 01 00 01 00 02 00 00 06 00 01 E6 00 12 00 03 00 02 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 02 01 BA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 00</result><result name="FLM_DATEN_2">08 00 0 00 00 00 00 00 00 00 0C 00 80 1B 00 45 10 00 A6 0D 00 51 16 00 59 44 00 00 EB 00 00 CA 00 00 49 00 17 00 10 00 0C 00 05 00 04 00 06 00 02 00 01 00 00 00 00 12 00 00 3A 00 00 26 00 00 13 00 0D 00 08 09 00 04 00 0A 00 00 00 00 00 00 00 00 03 00 00 0D 00 00 0B 00 00 07 00 02 00 05 00 01 00 01 00 04 00 00 00 00 00 00 04 00 07 00 08 00 06 00 04 00 …
  •     small data subsets are stored most data stays in file system (original XML-files) only about 3 years history is stored in the moment very much denormalized data (e.g. Entity-Attribute-Value tables)  TCO & performance limits (queries are slow - pivoting is expensive)  cover the whole live cycle 15 years (incl. production data)  more data sources: social media (motortalk)  lower TCO for storage & flexible analysis  …impossible with „classical“ RDBMS
  • "Big data" is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055
  • ... Modular Hardware Architecture ...  ColumnStore v2 storage  Hadoop Regions Tight integration of “nonstructured” data FDR Infiniband Ultra high compression Direct attached SAS Scale Unit
  • Parallel Data Warehouse Screenshots
  • PDW in SQL Server Data Tools A familiar development enviroment
  • …just counting rows Scanning 10 billion rows… …does not take… …that long!
  • …a reporting query …won„t take… And even complex queries… …much longer!
  • Data Distribution Data is distributed evenly over all data nodes…
  • Azure UX Azure SDK HDInsight * Hive Templeton RDP * Pig HCatalog Ambari Map Reduce * Azure Blobs * = good to know! HDFS Sqoop Oozie
  • Analyze Demo-Umgebung Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  • Solution Components HDInsight Virtual Machine Twitter Excel
  • Big Data Twitter Demo Azure Management Portal
  • Analyse Manage Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  • Big Data Twitter Demo – Dashboard
  • Analyse Manage Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  • Big Data Twitter Demo – SQL Azure
  • Analyse Manage Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  • Big Data Twitter Demo Azure Blob Storage
  • Analyse Analyse Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  • Big Data Twitter Demo – Hive
  • Analyse Insight Extract Azure Blob Storage … Twitter Hive Tables StreamInsight SQL Azure Real-Time Dashboard Mash Up & Visualise
  • Big Data Twitter Demo Mash Up in Excel
  • Polybase Regular T-SQL Results  T-SQL query engine for RDBMS & Hadoop  Cost base optimizer. decides on:  Rendering operators in Map/Reduce-Jobs or  Moving HDFS data into RDBMS storage PDW  HDFS-Bridge for parallelized Data Transport HDFS Data Nodes &
  • T-SQL for Polybase A distributed query. Definition of an external table.
  • Modern Data Warehousing Parallel Data Warehouse HDInsight Polybase &
  • Big Data Enterprise Architecture &
  • What„s next… Twitter Big Data Sourcecode: http://twitterbigdata.codeplex.com/ Twitter Big Data Setup: http://aka.ms/bigdatatwitter Azure Trial: http://aka.ms/azurenow HDInsight: www.windowsazure.com/en-us/documentation/services/hdinsight/ Hortonworks for Windows: http://hortonworks.com/products/hdp-windows/ PDW und Polybase: http://microsoft.com/pdw Microsoft Big Data: http://microsoft.com/bigdata Deutsche SQL Server Konferenz 2014: http://www.sqlkonferenz.de
  • “Big data is like teen sex. Everybody is talking about it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” Dan Ariely, professor and director of Center for Advanced Hindsight at Duke University