Your SlideShare is downloading. ×
Sascha DittmannBlog: http://www.sascha-dittmann.deTwitter: @SaschaDittmannMicrosoft HDInsight für .NET EntwicklerBig Data ...
Large Hadron Collider (CERN Schweiz)http://public.web.cern.ch/public/en/lhc/Computing-en.htmlDer LHC Teilchenbeschleuniger...
Woher kommt Big Data70% of U.S.smartphone ownersregularly shop onlinevia their devices.44% of users(350M people)access Fac...
Big Data SzenarienWeb appoptimizationSmart metermonitoringEquipmentmonitoringAdvertisinganalysisLife sciencesresearchFraud...
Big Data ist sexyhttp://hbr.org/
Apache Hadoop EcosystemMapReduce (Job Scheduling/Execution System)HDFS(Hadoop Distributed File System)HBase (Column DB)Pig...
Microsoft HDInsightMapReduce (Job Scheduling/Execution System)HDFS(Hadoop Distributed File System)HBase (Column DB)Pig(Dat...
Hadoop Distributed File System (HDFS)BootvorgangAusfallsicherheitBenutzeranfrage
Hadoop Distributed File System (HDFS)BootvorgangAusfallsicherheitBenutzeranfrage
BootvorgangAusfallsicherheitBenutzeranfrageHadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS) Portable Operating System Interface (POSIX) Replikation auf mehrere Datenknotenjs>...
HDInsight Dashboard Demo
Map/Reduce am Beispiel von Messdaten0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N...
Map/Reduce am Beispiel von Messdaten0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N...
Map/ReduceMapSortShuffleDataNodeMapSortShuffleDataNodeMapSortShuffleDataNodeReduce0067011990999991950051507004+68750004301...
Map/Reduce mit Combine MethodeMapCombineSortShuffleDataNodeMapCombineSortShuffleDataNodeMapCombineSortShuffleDataNodeReduc...
Map/Reduce am Beispiel von Messdaten
Wörter zählen mit JavaScript (Map)
Wörter zählen mit JavaScript (Reduce)
Map/Reduce mit JavaScript
Verfeinern mit Pig Latinpig.from("/user/Sascha/input/texte").mapReduce("/user/…/WordCount.js", "Woerter, Anzahl:long").ord...
Pig Latin
Wörter zählen mit C# (Map - Classic)
Wörter zählen mit C# (Reduce - Classic)
Map/Reduce mit C#
.NET Job Submission Framework (Map)
.NET Job Submission Framework (Reduce)
Externe Hive-Tabelle erzeugenCREATE EXTERNAL TABLE twitter_raw(tweet_json STRING)COMMENT Twitter Sample DataROW FORMAT DEL...
Twitter JSON{"possibly_sensitive_editable":true,"place":null,"text":"Pre - #ConvCloud chat insights. " #Cloud Security, ar...
JSON in Hive interpretierenFROM twitter_rawINSERT OVERRIDE TABLE twitter_tempSELECT get_json_object(tweet_json, $.created_...
Hive
RDBMS vs. HadoopRDBMS HadoopVolumen Gigabyte PetabyteVerarbeitung Ad-Hoc und batch BatchUpdates Viele Lese- undSchreibzugr...
Polybase / SQL Server PDW
Fragen? ????
Upcoming SlideShare
Loading in...5
×

dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler

2,247

Published on

Slides from my Microsoft HDInsight session at the dotnet Cologne 2013

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,247
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler"

  1. 1. Sascha DittmannBlog: http://www.sascha-dittmann.deTwitter: @SaschaDittmannMicrosoft HDInsight für .NET EntwicklerBig Data Analysen mit JavaScript und C#
  2. 2. Large Hadron Collider (CERN Schweiz)http://public.web.cern.ch/public/en/lhc/Computing-en.htmlDer LHC Teilchenbeschleunigerproduziert 15 PB Messdaten pro Jahr*
  3. 3. Woher kommt Big Data70% of U.S.smartphone ownersregularly shop onlinevia their devices.44% of users(350M people)access Facebook viamobile devices.50% ofmillennials usemobile devices toresearch products.60%of U.S.mobile data will beaudio and videostreaming by 2014.Mobility2/3of the worldsmobile data traffic willbe video by 2016.33%of BI willbe consumed viahandheld devicesby 2013.Gaming consoles arenow used an average of1.5 hrs/wkto connect to theInternet.80%growth ofunstructured data ispredicted over thenext five years.1.8 zettabytesof digital data werein useworldwide in2011, up 30%from 2010.1 in 4Facebook usersadd their locationto posts(2B/month).500M Tweetsare hosted onTwitter each day.38% of peoplerecommend a brandthey “like” or followon a social network.100MFacebook“likes” per day.Brands getBigDataSocialMobility Cloud
  4. 4. Big Data SzenarienWeb appoptimizationSmart metermonitoringEquipmentmonitoringAdvertisinganalysisLife sciencesresearchFrauddetectionHealthcareoutcomesWeatherforecastingNatural resourceexplorationSocial networkanalysisChurnanalysisTraffic flowoptimizationIT infrastructureoptimizationLegaldiscovery
  5. 5. Big Data ist sexyhttp://hbr.org/
  6. 6. Apache Hadoop EcosystemMapReduce (Job Scheduling/Execution System)HDFS(Hadoop Distributed File System)HBase (Column DB)Pig (DataFlow)Hive(Warehouseand DataAccess)Oozie(Workflow)SqoopTraditional BI ToolsHBase / Cassandra(Columnar NoSQL Databases)Avro(Serialization)Zookeeper(Coordination)ApacheMahoutCascading(programmingmodel)Hadoop = MapReduce + HDFSFlume
  7. 7. Microsoft HDInsightMapReduce (Job Scheduling/Execution System)HDFS(Hadoop Distributed File System)HBase (Column DB)Pig(DataFlow)Hive(Warehouse and DataAccess)Oozie(Workflow)SqoopTraditional BI ToolsHBase / Cassandra(Columnar NoSQL Databases)Avro(Serialization)Zookeeper(Coordination)ApacheMahoutCascading(programming model)Hadoop = MapReduce + HDFSFlumeWindowsSystemCenterActiveDirectoryVisual Studio
  8. 8. Hadoop Distributed File System (HDFS)BootvorgangAusfallsicherheitBenutzeranfrage
  9. 9. Hadoop Distributed File System (HDFS)BootvorgangAusfallsicherheitBenutzeranfrage
  10. 10. BootvorgangAusfallsicherheitBenutzeranfrageHadoop Distributed File System (HDFS)
  11. 11. Hadoop Distributed File System (HDFS) Portable Operating System Interface (POSIX) Replikation auf mehrere Datenknotenjs> #ls /user/Sascha/input/ncdcFound 9 itemsdrwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/alldrwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadatadrwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/microdrwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab-rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt-rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  12. 12. HDInsight Dashboard Demo
  13. 13. Map/Reduce am Beispiel von Messdaten0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+999999999990043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+999999999990043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+999999999990043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+999999999990043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999Jahr Lufttemperatur
  14. 14. Map/Reduce am Beispiel von Messdaten0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+999999999990043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+999999999990043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+999999999990043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+999999999990043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999Messqualität
  15. 15. Map/ReduceMapSortShuffleDataNodeMapSortShuffleDataNodeMapSortShuffleDataNodeReduce0067011990999991950051507004+687500043011990999991950051512004+687500043011990999991950051518004+687500043012650999991949032412004+623000043012650999991949032418004+623001949,01950,221950,551952,-111950,331949,01950,[22,33,55]1952,-111949,01950,551952,-11
  16. 16. Map/Reduce mit Combine MethodeMapCombineSortShuffleDataNodeMapCombineSortShuffleDataNodeMapCombineSortShuffleDataNodeReduce0067011990999991950051507004+687500043011990999991950051512004+687500043011990999991950051518004+687500043012650999991949032412004+623000043012650999991949032418004+623001949,01950,221950,551952,-111950,331949,01950,551952,-111950,331949,01950,[33,55]1952,-111949,01950,551952,-11
  17. 17. Map/Reduce am Beispiel von Messdaten
  18. 18. Wörter zählen mit JavaScript (Map)
  19. 19. Wörter zählen mit JavaScript (Reduce)
  20. 20. Map/Reduce mit JavaScript
  21. 21. Verfeinern mit Pig Latinpig.from("/user/Sascha/input/texte").mapReduce("/user/…/WordCount.js", "Woerter, Anzahl:long").orderBy("Anzahl DESC").take(15).to("/user/Sascha/output/Top15Woerter")
  22. 22. Pig Latin
  23. 23. Wörter zählen mit C# (Map - Classic)
  24. 24. Wörter zählen mit C# (Reduce - Classic)
  25. 25. Map/Reduce mit C#
  26. 26. .NET Job Submission Framework (Map)
  27. 27. .NET Job Submission Framework (Reduce)
  28. 28. Externe Hive-Tabelle erzeugenCREATE EXTERNAL TABLE twitter_raw(tweet_json STRING)COMMENT Twitter Sample DataROW FORMAT DELIMITED LINES TERMINATEDBY 10STORED AS TEXTFILELOCATION /example/twitterdata;
  29. 29. Twitter JSON{"possibly_sensitive_editable":true,"place":null,"text":"Pre - #ConvCloud chat insights. " #Cloud Security, are we missing the point?" from@christianve http://t.co/Smo0CPvb #HP #cloudsource”,"id_str":"223418953114984448”,"favorited":false,"possibly_sensitive":false,"created_at":"Thu Jul 12 14:10:04 +0000 2012","retweeted":false,"retweet_count":0,"user":{"is_translator":false,"profile_use_background_image":true,"profile_image_url_https":"https://si0.twimg.com/profile_images/640456324/Paul_Calento_normal.jpg","id_str":"103006513","profile_text_color":"333333","statuses_count":5984,"following":null,"followers_count":744,"default_profile_image":false,"profile_link_color":"FF3300",}, …..}
  30. 30. JSON in Hive interpretierenFROM twitter_rawINSERT OVERRIDE TABLE twitter_tempSELECT get_json_object(tweet_json, $.created_at),substr(get_json_object(tweet_json, $.created_at),9,2),substr(get_json_object(tweet_json, $.created_at),12,8),get_json_object(tweet_json, $.in_reply_to_user_id_str),get_json_object(tweet_json, $.text),get_json_object(tweet_json, $.contributors),get_json_object(tweet_json, $.retweeted),get_json_object(tweet_json, $.truncated),get_json_object(tweet_json, $.favorited),cast(get_json_object(tweet_json, $.retweet_count) as int),/* … */get_json_object(tweet_json, $.user.profile_image_url_https),cast(get_json_object(tweet_json, $.user.followers_count) as int),get_json_object(tweet_json, $.user.location),get_json_object(tweet_json, $.user.time_zone),get_json_object(tweet_json, $.user.created_at);
  31. 31. Hive
  32. 32. RDBMS vs. HadoopRDBMS HadoopVolumen Gigabyte PetabyteVerarbeitung Ad-Hoc und batch BatchUpdates Viele Lese- undSchreibzugriffeEinmal schreiben,Viele LesezugriffeSchema Statisches Schema Dynamisches SchemaDatenintegrität Hoch NiedrigSkalierverhalten Nicht-Linear Linear
  33. 33. Polybase / SQL Server PDW
  34. 34. Fragen? ????

×