Your SlideShare is downloading. ×
0
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

dotnet Cologne 2013 - Microsoft HD Insight für .NET Entwickler

2,241

Published on

Slides from my Microsoft HDInsight session at the dotnet Cologne 2013

Slides from my Microsoft HDInsight session at the dotnet Cologne 2013

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,241
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Sascha DittmannBlog: http://www.sascha-dittmann.deTwitter: @SaschaDittmannMicrosoft HDInsight für .NET EntwicklerBig Data Analysen mit JavaScript und C#
  • 2. Large Hadron Collider (CERN Schweiz)http://public.web.cern.ch/public/en/lhc/Computing-en.htmlDer LHC Teilchenbeschleunigerproduziert 15 PB Messdaten pro Jahr*
  • 3. Woher kommt Big Data70% of U.S.smartphone ownersregularly shop onlinevia their devices.44% of users(350M people)access Facebook viamobile devices.50% ofmillennials usemobile devices toresearch products.60%of U.S.mobile data will beaudio and videostreaming by 2014.Mobility2/3of the worldsmobile data traffic willbe video by 2016.33%of BI willbe consumed viahandheld devicesby 2013.Gaming consoles arenow used an average of1.5 hrs/wkto connect to theInternet.80%growth ofunstructured data ispredicted over thenext five years.1.8 zettabytesof digital data werein useworldwide in2011, up 30%from 2010.1 in 4Facebook usersadd their locationto posts(2B/month).500M Tweetsare hosted onTwitter each day.38% of peoplerecommend a brandthey “like” or followon a social network.100MFacebook“likes” per day.Brands getBigDataSocialMobility Cloud
  • 4. Big Data SzenarienWeb appoptimizationSmart metermonitoringEquipmentmonitoringAdvertisinganalysisLife sciencesresearchFrauddetectionHealthcareoutcomesWeatherforecastingNatural resourceexplorationSocial networkanalysisChurnanalysisTraffic flowoptimizationIT infrastructureoptimizationLegaldiscovery
  • 5. Big Data ist sexyhttp://hbr.org/
  • 6. Apache Hadoop EcosystemMapReduce (Job Scheduling/Execution System)HDFS(Hadoop Distributed File System)HBase (Column DB)Pig (DataFlow)Hive(Warehouseand DataAccess)Oozie(Workflow)SqoopTraditional BI ToolsHBase / Cassandra(Columnar NoSQL Databases)Avro(Serialization)Zookeeper(Coordination)ApacheMahoutCascading(programmingmodel)Hadoop = MapReduce + HDFSFlume
  • 7. Microsoft HDInsightMapReduce (Job Scheduling/Execution System)HDFS(Hadoop Distributed File System)HBase (Column DB)Pig(DataFlow)Hive(Warehouse and DataAccess)Oozie(Workflow)SqoopTraditional BI ToolsHBase / Cassandra(Columnar NoSQL Databases)Avro(Serialization)Zookeeper(Coordination)ApacheMahoutCascading(programming model)Hadoop = MapReduce + HDFSFlumeWindowsSystemCenterActiveDirectoryVisual Studio
  • 8. Hadoop Distributed File System (HDFS)BootvorgangAusfallsicherheitBenutzeranfrage
  • 9. Hadoop Distributed File System (HDFS)BootvorgangAusfallsicherheitBenutzeranfrage
  • 10. BootvorgangAusfallsicherheitBenutzeranfrageHadoop Distributed File System (HDFS)
  • 11. Hadoop Distributed File System (HDFS) Portable Operating System Interface (POSIX) Replikation auf mehrere Datenknotenjs> #ls /user/Sascha/input/ncdcFound 9 itemsdrwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/alldrwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadatadrwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/microdrwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab-rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt-rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • 12. HDInsight Dashboard Demo
  • 13. Map/Reduce am Beispiel von Messdaten0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+999999999990043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+999999999990043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+999999999990043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+999999999990043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999Jahr Lufttemperatur
  • 14. Map/Reduce am Beispiel von Messdaten0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+999999999990043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+999999999990043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+999999999990043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+999999999990043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999Messqualität
  • 15. Map/ReduceMapSortShuffleDataNodeMapSortShuffleDataNodeMapSortShuffleDataNodeReduce0067011990999991950051507004+687500043011990999991950051512004+687500043011990999991950051518004+687500043012650999991949032412004+623000043012650999991949032418004+623001949,01950,221950,551952,-111950,331949,01950,[22,33,55]1952,-111949,01950,551952,-11
  • 16. Map/Reduce mit Combine MethodeMapCombineSortShuffleDataNodeMapCombineSortShuffleDataNodeMapCombineSortShuffleDataNodeReduce0067011990999991950051507004+687500043011990999991950051512004+687500043011990999991950051518004+687500043012650999991949032412004+623000043012650999991949032418004+623001949,01950,221950,551952,-111950,331949,01950,551952,-111950,331949,01950,[33,55]1952,-111949,01950,551952,-11
  • 17. Map/Reduce am Beispiel von Messdaten
  • 18. Wörter zählen mit JavaScript (Map)
  • 19. Wörter zählen mit JavaScript (Reduce)
  • 20. Map/Reduce mit JavaScript
  • 21. Verfeinern mit Pig Latinpig.from("/user/Sascha/input/texte").mapReduce("/user/…/WordCount.js", "Woerter, Anzahl:long").orderBy("Anzahl DESC").take(15).to("/user/Sascha/output/Top15Woerter")
  • 22. Pig Latin
  • 23. Wörter zählen mit C# (Map - Classic)
  • 24. Wörter zählen mit C# (Reduce - Classic)
  • 25. Map/Reduce mit C#
  • 26. .NET Job Submission Framework (Map)
  • 27. .NET Job Submission Framework (Reduce)
  • 28. Externe Hive-Tabelle erzeugenCREATE EXTERNAL TABLE twitter_raw(tweet_json STRING)COMMENT Twitter Sample DataROW FORMAT DELIMITED LINES TERMINATEDBY 10STORED AS TEXTFILELOCATION /example/twitterdata;
  • 29. Twitter JSON{"possibly_sensitive_editable":true,"place":null,"text":"Pre - #ConvCloud chat insights. " #Cloud Security, are we missing the point?" from@christianve http://t.co/Smo0CPvb #HP #cloudsource”,"id_str":"223418953114984448”,"favorited":false,"possibly_sensitive":false,"created_at":"Thu Jul 12 14:10:04 +0000 2012","retweeted":false,"retweet_count":0,"user":{"is_translator":false,"profile_use_background_image":true,"profile_image_url_https":"https://si0.twimg.com/profile_images/640456324/Paul_Calento_normal.jpg","id_str":"103006513","profile_text_color":"333333","statuses_count":5984,"following":null,"followers_count":744,"default_profile_image":false,"profile_link_color":"FF3300",}, …..}
  • 30. JSON in Hive interpretierenFROM twitter_rawINSERT OVERRIDE TABLE twitter_tempSELECT get_json_object(tweet_json, $.created_at),substr(get_json_object(tweet_json, $.created_at),9,2),substr(get_json_object(tweet_json, $.created_at),12,8),get_json_object(tweet_json, $.in_reply_to_user_id_str),get_json_object(tweet_json, $.text),get_json_object(tweet_json, $.contributors),get_json_object(tweet_json, $.retweeted),get_json_object(tweet_json, $.truncated),get_json_object(tweet_json, $.favorited),cast(get_json_object(tweet_json, $.retweet_count) as int),/* … */get_json_object(tweet_json, $.user.profile_image_url_https),cast(get_json_object(tweet_json, $.user.followers_count) as int),get_json_object(tweet_json, $.user.location),get_json_object(tweet_json, $.user.time_zone),get_json_object(tweet_json, $.user.created_at);
  • 31. Hive
  • 32. RDBMS vs. HadoopRDBMS HadoopVolumen Gigabyte PetabyteVerarbeitung Ad-Hoc und batch BatchUpdates Viele Lese- undSchreibzugriffeEinmal schreiben,Viele LesezugriffeSchema Statisches Schema Dynamisches SchemaDatenintegrität Hoch NiedrigSkalierverhalten Nicht-Linear Linear
  • 33. Polybase / SQL Server PDW
  • 34. Fragen? ????

×