BigData Hackathon Rheinland
Einführung in Hadoop 1.x / HDInsight
Sascha Dittmann
Blog: http://www.sascha-dittmann.de
Twitt...
WAS IST BIG DATA?
Wo liegt das Problem?
28.06.2014 SQLSaturday Rheinland 2014
Wo liegt das Problem?
28.06.2014 SQLSaturday Rheinland 2014
Die 3 V’s
28.06.2014 SQLSaturday Rheinland 2014
Variety
Velocity
• Variety (Vielfalt)
• Relational, XML, Video, Text, ...
...
Skalierung
28.06.2014 SQLSaturday Rheinland 2014
Vertikale Skalierung Horizontale Skalierung
WAS IST HADOOP / HDINSIGHT
Auf der Suche nach Lösungen
28.06.2014 SQLSaturday Rheinland 2014
Apache Hadoop / Microsoft HDInsight
28.06.2014 SQLSaturday Rheinland 2014
+
Apache Hadoop 1.x Ecosystem
28.06.2014 SQLSaturday Rheinland 2014
MapReduce (Job Scheduling/Execution System)
HDFS
(Hadoop...
Microsoft HDInsight
28.06.2014 SQLSaturday Rheinland 2014
MapReduce (Job Scheduling/Execution System)
HDFS
(Hadoop Distrib...
THE HADOOP CORE
HDFS + MapReduce
28.06.2014 SQLSaturday Rheinland 2014
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Hadoop Distributed File System (HDFS)
28.06.2014 SQLSaturday Rheinland 2014
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Hadoop Distributed File System (HDFS)
28.06.2014 SQLSaturday Rheinland 2014
Bootvorgang
Ausfallsicherheit
Benutzeranfrage
Hadoop Distributed File System (HDFS)
28.06.2014 SQLSaturday Rheinland 2014
Hadoop Distributed File System (HDFS)
28.06.2014 SQLSaturday Rheinland 2014
 Portable Operating System Interface (POSIX)
...
Erstellen eines Clusters
28.06.2014 SQLSaturday Rheinland 2014
Map/Reduce am Beispiel von Messdaten
28.06.2014 SQLSaturday Rheinland 2014
0067011990999991950051507004+68750+023550FM-12+...
Map/Reduce am Beispiel von Messdaten
28.06.2014 SQLSaturday Rheinland 2014
0067011990999991950051507004+68750+023550FM-12+...
Map/Reduce
28.06.2014 SQLSaturday Rheinland 2014
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
DataNode
Map
Sort
Shuffle
Data...
Map/Reduce mit Combiner-Funktion
28.06.2014 SQLSaturday Rheinland 2014
Map
Combine
Sort
Shuffle
DataNode
Map
Combine
Sort
...
“Hello World” alla HDInsight
28.06.2014 SQLSaturday Rheinland 2014
RDBMS vs. Hadoop
28.06.2014 SQLSaturday Rheinland 2014
RDBMS Hadoop
Datenmenge Gigabytes Petabytes
Verarbeitung Ad-Hoc und...
Thank you!
for sponsorship
for volunteering
for participation
for a great
SQLSaturday #313
SQLSaturday Rheinland 201428.06...
Upcoming SlideShare
Loading in …5
×

Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight

449 views
280 views

Published on

The Big Data Hackathon Rheinland is your opportunity to assemble a team, learn and share, make new friends, and hack for fun and a good cause.

The goal of this hackathon is to educate and drive awareness of Microsoft’s big data technologies including:
HDInsight (Apache Hadoop as a service)
PowerBI (PowerQuery, Power View, Power Map, and more!)

The purpose of the hackathon is to show emergent trends or relationships between single or distinct datasets. Teams can choose to create a solution between one of three categories:
Modeling – Prove or disprove a relationship with the data
Visualization – Create a visualization to represent the data
Mobility – Create a solution showing mobile access to your data

The following software should be installed on your local machine or VM:
Visual Studio 2012/2013
Azure Powershell Cmdlets
Microsoft Power BI (Power Query, Power Map, PowerPivot, PowerView)
Cerebrata Azure Explorer
(http://www.cerebrata.com/products/azure-explorer/)

Agenda
08:00 - 08:30 Registration
08:30 – 10:30 BIG DATA Intro (Scott Klein, Sascha Dittmann, Oliver Engels, Tillmann Eitelberg)
10:30 – 17:30 Build your Big Data Solution
12:00 – 13:30 Lunch (No fixed time. Eat when you're hungry or just keep hacking…)
17:30 – 18:30 Dinner (Pizza time)
18:30 – 20:00 Presentation and discussion of the solutions

This event is co-hosted by PASS Deutschland e.V. and Microsoft.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
449
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight

  1. 1. BigData Hackathon Rheinland Einführung in Hadoop 1.x / HDInsight Sascha Dittmann Blog: http://www.sascha-dittmann.de Twitter: @SaschaDittmann
  2. 2. WAS IST BIG DATA? Wo liegt das Problem? 28.06.2014 SQLSaturday Rheinland 2014
  3. 3. Wo liegt das Problem? 28.06.2014 SQLSaturday Rheinland 2014
  4. 4. Die 3 V’s 28.06.2014 SQLSaturday Rheinland 2014 Variety Velocity • Variety (Vielfalt) • Relational, XML, Video, Text, ... • Velocity (Geschwindigkeit) • Batch, Intervall, Echtzeit, ... • Volume (Menge) • KB, MB, GB, TB, EB, PB, ZB, YB, ... Volume
  5. 5. Skalierung 28.06.2014 SQLSaturday Rheinland 2014 Vertikale Skalierung Horizontale Skalierung
  6. 6. WAS IST HADOOP / HDINSIGHT Auf der Suche nach Lösungen 28.06.2014 SQLSaturday Rheinland 2014
  7. 7. Apache Hadoop / Microsoft HDInsight 28.06.2014 SQLSaturday Rheinland 2014 +
  8. 8. Apache Hadoop 1.x Ecosystem 28.06.2014 SQLSaturday Rheinland 2014 MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehouse and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume
  9. 9. Microsoft HDInsight 28.06.2014 SQLSaturday Rheinland 2014 MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehous e and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume Windows SystemCenter ActiveDirectory Visual Studio
  10. 10. THE HADOOP CORE HDFS + MapReduce 28.06.2014 SQLSaturday Rheinland 2014
  11. 11. Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  12. 12. Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  13. 13. Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  14. 14. Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls /user/Sascha/input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  15. 15. Erstellen eines Clusters 28.06.2014 SQLSaturday Rheinland 2014
  16. 16. Map/Reduce am Beispiel von Messdaten 28.06.2014 SQLSaturday Rheinland 2014 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Jahr Lufttemperatur
  17. 17. Map/Reduce am Beispiel von Messdaten 28.06.2014 SQLSaturday Rheinland 2014 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Messqualität
  18. 18. Map/Reduce 28.06.2014 SQLSaturday Rheinland 2014 Map Sort Shuffle DataNode Map Sort Shuffle DataNode Map Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,[22,33,55] 1952,-11 1949,0 1950,55 1952,-11
  19. 19. Map/Reduce mit Combiner-Funktion 28.06.2014 SQLSaturday Rheinland 2014 Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,55 1952,-11 1950,33 1949,0 1950,[33,55] 1952,-11 1949,0 1950,55 1952,-11
  20. 20. “Hello World” alla HDInsight 28.06.2014 SQLSaturday Rheinland 2014
  21. 21. RDBMS vs. Hadoop 28.06.2014 SQLSaturday Rheinland 2014 RDBMS Hadoop Datenmenge Gigabytes Petabytes Verarbeitung Ad-Hoc und Batch Batch Updates Viele Lese- und Schreibzugriffe Einmal Schreiben, Viele Lesezugriffe Datenschema Statisch Dynamisch Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  22. 22. Thank you! for sponsorship for volunteering for participation for a great SQLSaturday #313 SQLSaturday Rheinland 201428.06.2014

×