PASS Camp 2012 - Big Data mit Microsoft (Teil 1)

421 views

Published on

http://www.passcamp.de/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
421
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PASS Camp 2012 - Big Data mit Microsoft (Teil 1)

  1. 1. PASS Camp 2012Big Data mit Microsoft (Teil 1)Software Developer / Solution ArchitectTwitter: @SaschaDittmannBlog: http://www.sascha-dittmann.de
  2. 2. Was könnte das sein? 180.000.000.000.000.000.000 1.800.000.000.000.000.000.000
  3. 3. Weltweites Datenvolumen 180.000.000.000.000.000.000 = 0,18 ZB (Zettabytes) - Stand 2006 1.800.000.000.000.000.000.000 = 1,8 ZB (Zettabytes) - Stand 2011
  4. 4. Skalierung Vertikale Skalierung Horizontale Skalierung
  5. 5. Apache Hadoop Ecosystem Oozie HBase / Cassandra Traditional BI Tools (Workflow) (Columnar NoSQL Databases) Hive Cascading Pig (Data (Warehouse Apache (programming Flume Sqoop Flow) and Data Mahout model) Access) Zookeeper (Coordination) Avro (Serialization) HBase (Column DB) MapReduce (Job Scheduling/Execution System) Hadoop = MapReduce + HDFS HDFS (Hadoop Distributed File System)
  6. 6. Apache Hadoop Ecosystem Visual Studio Oozie HBase / Cassandra Traditional BI Tools (Workflow) (Columnar NoSQL Databases) Hive Cascading Pig (Data (Warehouse Apache (programming Flume Sqoop Flow) and Data Mahout model) Access) Active Directory System Center Zookeeper (Coordination) Avro (Serialization) HBase (Column DB) MapReduce (Job Scheduling/Execution System) Hadoop = MapReduce + HDFS HDFS (Hadoop Distributed File System) Windows
  7. 7. Hadoop Distributed File SystemBootvorgangAusfallsicherheitBenutzeranfrage
  8. 8. Hadoop Distributed File SystemBootvorgangAusfallsicherheitBenutzeranfrage
  9. 9. Hadoop Distributed File SystemBootvorgangAusfallsicherheitBenutzeranfrage
  10. 10. Hadoop Distributed File System  Portable Operating System Interface (POSIX)  Replikation auf mehrere Datenknoten js> #ls input/ncdc Found 9 items drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/_distcp_logs_g0dedn drwxr-xr-x - Sascha supergroup 0 2012-04-24 12:04 /user/Sascha/input/ncdc/_distcp_logs_ofj0u6 drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:09 /user/Sascha/input/ncdc/all drwxr-xr-x - Sascha supergroup 0 2012-04-24 13:01 /user/Sascha/input/ncdc/all2 drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/metadata drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro drwxr-xr-x - Sascha supergroup 0 2012-04-23 13:06 /user/Sascha/input/ncdc/micro-tab -rw-r--r-- 3 Sascha supergroup 529 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt -rw-r--r-- 3 Sascha supergroup 168 2012-04-23 13:06 /user/Sascha/input/ncdc/sample.txt.gz
  11. 11. Map / Reduce DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[22,33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  12. 12. Combine Methode DataNode DataNode DataNode 0067011990999991950051507004+68750 0043011990999991950051512004+68750 0043011990999991950051518004+68750 0043012650999991949032412004+62300 0043012650999991949032418004+62300 1949,0 1952,-11 1950,22 Map Map Map 1950,55 1950,33 1949,0 1952,-11 Combine Combine Combine 1950,55 1950,33 Sort Sort Sort 1949,0 1950,[33,55] Shuffle Shuffle Shuffle 1952,-11 Reduce 1949,0 1950,55 1952,-11
  13. 13. RDBMS vs. Hadoop RDBMS Hadoop Datenmenge Gigabytes Petabytes Zugriff Interaktiv und Batch Batch Lese- / Schreibzugriffe Viele Lese- und Einmaliges Schreiben Schreibzugriffe Viele Lesezugriffe Datenstruktur Statisches Schema Dynamisches Schema Datenintegrität Hoch Niedrig Skalierungsverhalten Nicht-Linear Linear
  14. 14. Demo‘s  Hadoop Umgebung  HDFS  Map/Reduce via JavaScript  Data Streaming mit C#  Power Pivot
  15. 15. Pig Latin pig .from("/user/Sascha/input/texte") .mapReduce("/user/…/WordCount.js" , "Woerter, Anzahl:long") .orderBy("Anzahl DESC") .take(15) .to("/user/Sascha/output/Top15Woerter")

×