Your SlideShare is downloading. ×
Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight

94
views

Published on

The Big Data Hackathon Rheinland is your opportunity to assemble a team, learn and share, make new friends, and hack for fun and a good cause. …

The Big Data Hackathon Rheinland is your opportunity to assemble a team, learn and share, make new friends, and hack for fun and a good cause.

The goal of this hackathon is to educate and drive awareness of Microsoft’s big data technologies including:
HDInsight (Apache Hadoop as a service)
PowerBI (PowerQuery, Power View, Power Map, and more!)

The purpose of the hackathon is to show emergent trends or relationships between single or distinct datasets. Teams can choose to create a solution between one of three categories:
Modeling – Prove or disprove a relationship with the data
Visualization – Create a visualization to represent the data
Mobility – Create a solution showing mobile access to your data

The following software should be installed on your local machine or VM:
Visual Studio 2012/2013
Azure Powershell Cmdlets
Microsoft Power BI (Power Query, Power Map, PowerPivot, PowerView)
Cerebrata Azure Explorer
(http://www.cerebrata.com/products/azure-explorer/)

Agenda
08:00 - 08:30 Registration
08:30 – 10:30 BIG DATA Intro (Scott Klein, Sascha Dittmann, Oliver Engels, Tillmann Eitelberg)
10:30 – 17:30 Build your Big Data Solution
12:00 – 13:30 Lunch (No fixed time. Eat when you're hungry or just keep hacking…)
17:30 – 18:30 Dinner (Pizza time)
18:30 – 20:00 Presentation and discussion of the solutions

This event is co-hosted by PASS Deutschland e.V. and Microsoft.


0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
94
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BigData Hackathon Rheinland Einführung in Hadoop 1.x / HDInsight Sascha Dittmann Blog: http://www.sascha-dittmann.de Twitter: @SaschaDittmann
  • 2. WAS IST BIG DATA? Wo liegt das Problem? 28.06.2014 SQLSaturday Rheinland 2014
  • 3. Wo liegt das Problem? 28.06.2014 SQLSaturday Rheinland 2014
  • 4. Die 3 V’s 28.06.2014 SQLSaturday Rheinland 2014 Variety Velocity • Variety (Vielfalt) • Relational, XML, Video, Text, ... • Velocity (Geschwindigkeit) • Batch, Intervall, Echtzeit, ... • Volume (Menge) • KB, MB, GB, TB, EB, PB, ZB, YB, ... Volume
  • 5. Skalierung 28.06.2014 SQLSaturday Rheinland 2014 Vertikale Skalierung Horizontale Skalierung
  • 6. WAS IST HADOOP / HDINSIGHT Auf der Suche nach Lösungen 28.06.2014 SQLSaturday Rheinland 2014
  • 7. Apache Hadoop / Microsoft HDInsight 28.06.2014 SQLSaturday Rheinland 2014 +
  • 8. Apache Hadoop 1.x Ecosystem 28.06.2014 SQLSaturday Rheinland 2014 MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehouse and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume
  • 9. Microsoft HDInsight 28.06.2014 SQLSaturday Rheinland 2014 MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehous e and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume Windows SystemCenter ActiveDirectory Visual Studio
  • 10. THE HADOOP CORE HDFS + MapReduce 28.06.2014 SQLSaturday Rheinland 2014
  • 11. Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  • 12. Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  • 13. Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  • 14. Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls /user/Sascha/input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • 15. Erstellen eines Clusters 28.06.2014 SQLSaturday Rheinland 2014
  • 16. Map/Reduce am Beispiel von Messdaten 28.06.2014 SQLSaturday Rheinland 2014 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Jahr Lufttemperatur
  • 17. Map/Reduce am Beispiel von Messdaten 28.06.2014 SQLSaturday Rheinland 2014 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Messqualität
  • 18. Map/Reduce 28.06.2014 SQLSaturday Rheinland 2014 Map Sort Shuffle DataNode Map Sort Shuffle DataNode Map Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,[22,33,55] 1952,-11 1949,0 1950,55 1952,-11
  • 19. Map/Reduce mit Combiner-Funktion 28.06.2014 SQLSaturday Rheinland 2014 Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,55 1952,-11 1950,33 1949,0 1950,[33,55] 1952,-11 1949,0 1950,55 1952,-11
  • 20. “Hello World” alla HDInsight 28.06.2014 SQLSaturday Rheinland 2014
  • 21. RDBMS vs. Hadoop 28.06.2014 SQLSaturday Rheinland 2014 RDBMS Hadoop Datenmenge Gigabytes Petabytes Verarbeitung Ad-Hoc und Batch Batch Updates Viele Lese- und Schreibzugriffe Einmal Schreiben, Viele Lesezugriffe Datenschema Statisch Dynamisch Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  • 22. Thank you! for sponsorship for volunteering for participation for a great SQLSaturday #313 SQLSaturday Rheinland 201428.06.2014