• Save
Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight
Upcoming SlideShare
Loading in...5
×
 

Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight

on

  • 112 views

The Big Data Hackathon Rheinland is your opportunity to assemble a team, learn and share, make new friends, and hack for fun and a good cause. ...

The Big Data Hackathon Rheinland is your opportunity to assemble a team, learn and share, make new friends, and hack for fun and a good cause.

The goal of this hackathon is to educate and drive awareness of Microsoft’s big data technologies including:
HDInsight (Apache Hadoop as a service)
PowerBI (PowerQuery, Power View, Power Map, and more!)

The purpose of the hackathon is to show emergent trends or relationships between single or distinct datasets. Teams can choose to create a solution between one of three categories:
Modeling – Prove or disprove a relationship with the data
Visualization – Create a visualization to represent the data
Mobility – Create a solution showing mobile access to your data

The following software should be installed on your local machine or VM:
Visual Studio 2012/2013
Azure Powershell Cmdlets
Microsoft Power BI (Power Query, Power Map, PowerPivot, PowerView)
Cerebrata Azure Explorer
(http://www.cerebrata.com/products/azure-explorer/)

Agenda
08:00 - 08:30 Registration
08:30 – 10:30 BIG DATA Intro (Scott Klein, Sascha Dittmann, Oliver Engels, Tillmann Eitelberg)
10:30 – 17:30 Build your Big Data Solution
12:00 – 13:30 Lunch (No fixed time. Eat when you're hungry or just keep hacking…)
17:30 – 18:30 Dinner (Pizza time)
18:30 – 20:00 Presentation and discussion of the solutions

This event is co-hosted by PASS Deutschland e.V. and Microsoft.

Statistics

Views

Total Views
112
Views on SlideShare
112
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight Big Data Hackathon Rheinland - Einführung in Hadoop 1.x / HDInsight Presentation Transcript

  • BigData Hackathon Rheinland Einführung in Hadoop 1.x / HDInsight Sascha Dittmann Blog: http://www.sascha-dittmann.de Twitter: @SaschaDittmann
  • WAS IST BIG DATA? Wo liegt das Problem? 28.06.2014 SQLSaturday Rheinland 2014
  • Wo liegt das Problem? 28.06.2014 SQLSaturday Rheinland 2014
  • Die 3 V’s 28.06.2014 SQLSaturday Rheinland 2014 Variety Velocity • Variety (Vielfalt) • Relational, XML, Video, Text, ... • Velocity (Geschwindigkeit) • Batch, Intervall, Echtzeit, ... • Volume (Menge) • KB, MB, GB, TB, EB, PB, ZB, YB, ... Volume
  • Skalierung 28.06.2014 SQLSaturday Rheinland 2014 Vertikale Skalierung Horizontale Skalierung
  • WAS IST HADOOP / HDINSIGHT Auf der Suche nach Lösungen 28.06.2014 SQLSaturday Rheinland 2014
  • Apache Hadoop / Microsoft HDInsight 28.06.2014 SQLSaturday Rheinland 2014 +
  • Apache Hadoop 1.x Ecosystem 28.06.2014 SQLSaturday Rheinland 2014 MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehouse and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume
  • Microsoft HDInsight 28.06.2014 SQLSaturday Rheinland 2014 MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) HBase (Column DB) Pig (Data Flow) Hive (Warehous e and Data Access) Oozie (Workflow) Sqoop Traditional BI Tools HBase / Cassandra (Columnar NoSQL Databases) Avro(Serialization) Zookeeper(Coordination) Apache Mahout Cascading (programming model) Hadoop = MapReduce + HDFS Flume Windows SystemCenter ActiveDirectory Visual Studio
  • THE HADOOP CORE HDFS + MapReduce 28.06.2014 SQLSaturday Rheinland 2014
  • Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  • Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  • Bootvorgang Ausfallsicherheit Benutzeranfrage Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014
  • Hadoop Distributed File System (HDFS) 28.06.2014 SQLSaturday Rheinland 2014  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls /user/Sascha/input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2013-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2013-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2013-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  • Erstellen eines Clusters 28.06.2014 SQLSaturday Rheinland 2014
  • Map/Reduce am Beispiel von Messdaten 28.06.2014 SQLSaturday Rheinland 2014 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Jahr Lufttemperatur
  • Map/Reduce am Beispiel von Messdaten 28.06.2014 SQLSaturday Rheinland 2014 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999 Messqualität
  • Map/Reduce 28.06.2014 SQLSaturday Rheinland 2014 Map Sort Shuffle DataNode Map Sort Shuffle DataNode Map Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,[22,33,55] 1952,-11 1949,0 1950,55 1952,-11
  • Map/Reduce mit Combiner-Funktion 28.06.2014 SQLSaturday Rheinland 2014 Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Map Combine Sort Shuffle DataNode Reduce 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1950,22 1950,55 1952,-11 1950,33 1949,0 1950,55 1952,-11 1950,33 1949,0 1950,[33,55] 1952,-11 1949,0 1950,55 1952,-11
  • “Hello World” alla HDInsight 28.06.2014 SQLSaturday Rheinland 2014
  • RDBMS vs. Hadoop 28.06.2014 SQLSaturday Rheinland 2014 RDBMS Hadoop Datenmenge Gigabytes Petabytes Verarbeitung Ad-Hoc und Batch Batch Updates Viele Lese- und Schreibzugriffe Einmal Schreiben, Viele Lesezugriffe Datenschema Statisch Dynamisch Datenintegrität Hoch Niedrig Skalierverhalten Nicht-Linear Linear
  • Thank you! for sponsorship for volunteering for participation for a great SQLSaturday #313 SQLSaturday Rheinland 201428.06.2014