Ein MapReduce-basiertes
    Programmiermodell für
selbstwartbare Aggregatsichten
                          2012-08-31




                            Johannes Schildgen
                              TU Kaiserslautern
                                schildgen@cs.uni-kl.de
Motivation
11    26        14 23 37
             39   41         26
   19    8
                     25
19   22 15 18 10 16
                    27
 8 9    12 14 15
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -1
Pickup: 12
…
Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
Ritter Sport: 41
                        Δ
                   Balisto: -8
Pickup: 12         Snickers: +24
…                  Ritter Sport: -7
Increment Installation

Kinderriegel: 26
Balisto: 31
Hanuta: 14
Snickers: 43
Ritter Sport: 34
                          Δ
                     Balisto: -8
Pickup: 12           Snickers: +24
…                    Ritter Sport: -7
Overwrite Installation

Kinderriegel: 26
Balisto: 39
Hanuta: 14
Snickers: 19
                                   Δ
                               Balisto: -8
Ritter Sport: 41Kinderriegel: 26
Pickup: 12      Balisto: 31    Snickers: +24
                Hanuta: 14 Ritter Sport: -7
…               Snickers: 43
                Ritter Sport: 34
                Pickup: 12
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
public class WordCount extends Configured implements Tool {

        public static class WordCountMapper extends
                         Mapper<LongWritable, Text, ImmutableBytesWritable,
                                          LongWritable> {

                 private Text word = new Text();

                 @Override
                 public void map(LongWritable key, Text value, Context context)
                                   throws IOException, InterruptedException {
                          String line = value.toString();
                          StringTokenizer tokenizer = new StringTokenizer(line);
                          while (tokenizer.hasMoreTokens()) {
                                   this.word.set(tokenizer.nextToken());
A   A   E
B   E   F
C   F   B
D   C   D
> create 'person', 'default'
> put 'person', 'p27', 'default:vorname', 'Anton'
> put 'person', 'p27', 'default:nachname', 'Schmidt'

> get 'person', 'p27'
COLUMN                CELL
 default:nachname     timestamp=1338991497408, value=Schmidt
 default:vorname      timestamp=1338991436688, value=Anton
2 row(s) in 0.0640 seconds
banane
die
iss
nimm
          3
          4
          1
          1
                Δ
              kaufe     1
schale    1
              die       0
schäle    1
              banane    1
schmeiß   1
              schmeiß   -1
weg       1
              schale    -1
              weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
              increment()
schale    0                 die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
banane
die
iss
nimm
          3
          4
          1
          1
                              Δ
                            kaufe     1
schale    1   overwrite()   die       0
schäle    1
                            banane    1
schmeiß   1
                            schmeiß   -1
weg       1
                            schale    -1
                            weg       -1
banane
die
iss
kaufe
nimm
          4
          4
          1
          1
          1
                              Δ
                            kaufe     1
schale    0   overwrite()   die       0
schäle    1                 banane    1
schmeiß   0                 schmeiß   -1
weg       0                 schale    -1
                            weg       -1
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }



}
void map(key, value) {
 if(value is inserted) {
    for(word : value.split(" ")) {
       write(word, 1);
    }
 else if(value is deleted) {
    for(word : value.split(" ")) {
       write(word, -1);
    }
 }
 else { // old result
    write(key, value);
 }
}
Overwrite Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  put = new Put(key);
  put.add("fam", "col", sum);
  context.write(key, put);
}
Increment Installation

void reduce(key, values) {
  sum = 0;
  for(value : value) {
    sum += value;
  }
  inc = new Increment(key);
  inc.add("fam", "col", sum);
  context.write(key, inc);
}
Formalisierung
Formalisierung
Allgemeiner Mapper
Allgemeiner Reducer
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
Kernfunktionalität:
Verteiltes Rechnen mittels MapReduce

                                          Ich kümmere mich um:
                                             IncDec, Overwrite,
                                           Lesen alter Ergebnisse,
                                       Erzeugung von Inkrementen,…


                             Ich erkläre dir, wie man
                              die Eingabedaten liest,
Kernfunktionalität:        aggregiert, invertiert und die
                                 Ausgabe schreibt.
Inkrementelles Rechnen
public class WordTranslator extends
    Translator<LongWritable, Text> {
  public void translate(…) {
    …
}


IncJob job = new IncJobOverwrite(conf);
job.setTranslatorClass(
             WordTranslator.class);
job.setAbelianClass(WordAbelian.class);
public class WordAbelian implements
    Abelian<WordAbelian> {
 WordAbelian invert() { … }
 WordAbelian aggregate(WordAbelian
                        other) { … }
 WordAbelian neutral() { … }
 boolean isNeutral() { … }
 Writable extractKey() { … }
 void write(…) { … }
 void readFields(…) { … }
}
public class WordSerializer
 implements Serializer<WordAbelian> {

 Writable serialize(Writable key,
                    WordAbelian v) {
    …
 }
 WordAbelian deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value) {
    …
 }
}
How To Write A Marimba-Job

1. Abelian-Klasse
2. Translator-Klasse
3. Serializer-Klasse
4. Hadoop-Job schreiben unter
   Verwendung der Klasse IncJob
Implementierung

                                      setInputTable(…)


                     IncJob          setOutputTable(…)




  IncJobFull
                    IncJobIncDec      IncJobOverwrite
Recomputation

                                   setResultInputTable(…)
NeutralOutputStrategy
(bei IncJobOverwrite)
public interface Abelian<T extends
 Abelian<?>> extends
 WritableComparable<BinaryComparable>{

 T invert();
 T aggregate(T other);
 T neutral();
 boolean isNeutral();
 Writable extractKey();
 void write(…);
 void readFields(…);
}
public interface Serialzer<T extends
 Abelian<?>> {

 Writable serialize(T obj);
 T deserializeHBase(
    byte[] rowId, byte[] colFamily,
    byte[] qualifier, byte[] value);
}
public abstract class Translator
 <KEYIN, VALUEIN> {

 public abstract void translate
    (KEYIN key, VALUEIN value,
     Context context);

this.mapContext.write(
    abelianValue.extractKey(),
    this.invertValue ?
         abelianValue.invert() :
         abelianValue);
GenericMapper


Eingabeformat liefert:      Value


OverwriteResult    InsertedValue      DeletedValue           PreservedValue




 deserialisieren    weitergeben    setze invertValue=true;     ignorieren
                                   weitergeben
GenericReducer


            1. Aggregieren
            2. Serialisieren
            3. Schreiben


IncDec:                        Overwrite:
                               PUT → schreiben
putToIncrement(…)              IGNORE → nicht schreiben
                               DELETE → putToDelete(...)
GenericCombiner

„Write A Combiner“
   -- 7 Tips for Improving MapReduce Performance, (Tipp 4)




                1. Aggregieren
TextWindowInputFormat
Beispielanwendungen:
                   1. WordCount
void translate(key, value) {
                                   WordAbelian invert() {
                                    return new WordAbelian(
 for(word : value.split(" ")) {        this.word,
  write(                               -1 * this.count);
    new WordAbelian(word, 1));     }
 }
}                                   WordAbelian aggregate(
                                    WordAbelian other) {
                                     return new WordAbelian(
Writable serialize(                     this.word,
    WordAbelian w) {                    this.count
 Put p = new Put(                       + other.count);
        w.getWord());               }
 p.add(…);
 return p;                         boolean neutral() {
}                                   return new WordAbelian(
                                       this.word, 0);
                                   }

                                  boolean isNeutral() {

  Translator                      }
                                   return (this.count == 0);



  Serializer                      WordAbelian
Beispielanwendungen:
     2. Friends Of Friends
                      FRIENDS


A
             D
     B                FRIENDS OF FRIENDS

C
         E
Beispielanwendungen:
              2. Friends Of Friends
            translate(person, friends):

aggregate(…):
Menge der Freundes-
freunde mischen
Beispielanwendungen:
3. Reverse WebLink-Graph

                            REVERSE WEB LINK GRAPH
                            (Row-ID -> Columns)

                            Google -> {eBay, Wikipedia}

          aggregate(…): -> {Google, Wikipedia}
                      eBay

          Menge der LinksMensa-KL -> {Google}
                          mischen
                            Facebook -> {Google, Mensa-
                            KL, Uni-KL}

                            Wikipedia -> {Google}

                            Uni-KL -> {Google, Wikipedia}
Beispielanwendungen:
         4. Bigrams
Hi, kannst du mich ___?___ am
Bahnhof abholen? So in etwa
10 ___?___. Viele liebe ___?__.
P.S. Ich habe viel ___?___.
Idee:
 Große Mengen von
Text mit MapReduce
    analysieren.
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
Beispielanwendungen:
                     4. Bigrams
extractKey()
  a                                         invert():
                                            count*=-1
  b             NGramAbelian
 count
                                       aggregate(… other):
write(…)                               count+=other.count
                          neutral():
           isNeutral():   count=0
           count==0




           NGramStep2Abelian
„Woher Daten
 nehmen?“
bitte
Hi, kannst du mich ___ ___ am
                    nicht
Bahnhof abholen? So in etwa
<num>Minuten
     <num>
     Jahre
10 ___ ___.                Grüße
                            dich
               Viele liebe ___ __.
                   zu
                   Spaß
P.S. Ich habe viel ___ ___.
Grundlagen &
                          Das Marimba-Framework   Evaluation
Vorangegangene Arbeiten
WordCount

               01:10




               01:00




               00:50




               00:40
Zeit [hh:mm]




               00:30                                                               FULL
                                                                                   INCDEC
                                                                                   OVERWRITE
               00:20




               00:10




               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Reverse Weblink-Graph

               02:51

               02:41

               02:31

               02:21

               02:11

               02:00

               01:50

               01:40
Zeit [hh:mm]




               01:30
                                                                                   FULL
               01:20
                                                                                   INCDEC
               01:10
                                                                                   OVERWRITE
               01:00

               00:50

               00:40

               00:30

               00:20

               00:10

               00:00
                       0%   10%   20%   30%      40%       50%   60%   70%   80%
                                              Änderungen
Fazit
Vollständige Neuberechnung
IncDec / Overwrite
Verwendete Grafiken
Folie 5-9:                                               Folie 37-44:
Flammen und Smilie: Microsoft Office 2010                Puzzle: http://www.flickr.com/photos/dps/136565237/

Folie 10:                                                Folie 46 - 48:
Google: http://www.google.de                             Junge: Microsoft Office 2010

Folie 11:                                                Folie 49:
Amazon: http://www.amazon.de                             Google: http://www.google.de
                                                         eBay: http://www.ebay.de
Folie 12:                                                Mensa-KL: http://www.mensa-kl.de
Hadoop: http://hadoop.apache.org                         facebook: http://www.facebook.de
Casio Wristwatch:                                        Wikipedia: http://de.wikipedia.org
http://www.flickr.com/photos/andresrueda/3448240252      TU Kaiserslautern: http://www.uni-kl.de

Folie 16:                                                Folie 50-51:
Hadoop: http://hadoop.apache.org                         Handy: Microsoft Office 2010

Folie 17:                                                Folie 56:
Hadoop: http://hadoop.apache.org                         Wikipedia: http://de.wikipedia.org
Notebook: Microsoft Office 2010                          Twitter: http://www.twitter.com

Folie 18:                                                Folie 57:
HBase: http://hbase.apache.org                           Handy: Microsoft Office 2010

Folie 31:                                                Folie 58:
Hadoop: http://hadoop.apache.org                         Hadoop: http://hadoop.apache.org
Casio Wristwatch:                                        Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252
http://www.flickr.com/photos/andresrueda/3448240252

Folie 32:
Gerüst: http://www.flickr.com/photos/michale/94538528/
Hadoop: http://hadoop.apache.org
Junge: Microsoft Office 2010
Literaturverzeichnis (1/2)
[0] Johannes Schildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten.
Masterarbeit, TU Kaiserslautern, August 2012

[1] Apache Hadoop project. http://hadoop.apache.org/.
[2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526.
[3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010.
http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/.
[4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012.
http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast.
[5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012.
http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1.
[6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004.
[7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011.
[8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis.
http://database.cs.brown.edu/projects/mapreduce-vs-dbms/.
[9] Ricky Ho. Map/Reduce to recommend people connection, August 2010.
http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html.
[10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012.
[11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views?
In LADIS 2011, pages 1 – 5, 9 2011.
[12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011.
[13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce.
[14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving-
mapreduce-performance/.
Literaturverzeichnis (2/2)
[15] TouchType Ltd. SwiftKey X - Android Apps auf Google Play, February 2012.
http://play.google.com/store/apps/details?id=com.touchtype.swiftkey.
[16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf.
[17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures.
ICDE, pages 183–194, 2011.
[18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012.
http://heise.de/-1569837.
[19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009.
http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7.
[20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern,
Juni 2011.
[21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012.
http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/.
[23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012.
[24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009.
http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation.
[25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009.
[26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012.
http://www.formtek.com/blog/?p=3032.
[27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60

Marimba - Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten

  • 1.
    Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten 2012-08-31 Johannes Schildgen TU Kaiserslautern schildgen@cs.uni-kl.de
  • 2.
  • 3.
    11 26 14 23 37 39 41 26 19 8 25 19 22 15 18 10 16 27 8 9 12 14 15
  • 4.
    Kinderriegel: 26 Balisto: 39 Hanuta:14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 5.
    Kinderriegel: 26 Balisto: 39 Hanuta:14 Snickers: 19 Ritter Sport: 41 Pickup: 12 …
  • 6.
    Kinderriegel: 26 Balisto: 39 Hanuta:14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -1 Pickup: 12 …
  • 7.
    Kinderriegel: 26 Balisto: 39 Hanuta:14 Snickers: 19 Ritter Sport: 41 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 8.
    Increment Installation Kinderriegel: 26 Balisto:31 Hanuta: 14 Snickers: 43 Ritter Sport: 34 Δ Balisto: -8 Pickup: 12 Snickers: +24 … Ritter Sport: -7
  • 9.
    Overwrite Installation Kinderriegel: 26 Balisto:39 Hanuta: 14 Snickers: 19 Δ Balisto: -8 Ritter Sport: 41Kinderriegel: 26 Pickup: 12 Balisto: 31 Snickers: +24 Hanuta: 14 Ritter Sport: -7 … Snickers: 43 Ritter Sport: 34 Pickup: 12
  • 12.
    Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 16.
    public class WordCountextends Configured implements Tool { public static class WordCountMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, LongWritable> { private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { this.word.set(tokenizer.nextToken());
  • 17.
    A A E B E F C F B D C D
  • 18.
    > create 'person','default' > put 'person', 'p27', 'default:vorname', 'Anton' > put 'person', 'p27', 'default:nachname', 'Schmidt' > get 'person', 'p27' COLUMN CELL default:nachname timestamp=1338991497408, value=Schmidt default:vorname timestamp=1338991436688, value=Anton 2 row(s) in 0.0640 seconds
  • 19.
    banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 20.
    banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 increment() schale 0 die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 21.
    banane die iss nimm 3 4 1 1 Δ kaufe 1 schale 1 overwrite() die 0 schäle 1 banane 1 schmeiß 1 schmeiß -1 weg 1 schale -1 weg -1
  • 22.
    banane die iss kaufe nimm 4 4 1 1 1 Δ kaufe 1 schale 0 overwrite() die 0 schäle 1 banane 1 schmeiß 0 schmeiß -1 weg 0 schale -1 weg -1
  • 23.
    void map(key, value){ if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } }
  • 24.
    void map(key, value){ if(value is inserted) { for(word : value.split(" ")) { write(word, 1); } else if(value is deleted) { for(word : value.split(" ")) { write(word, -1); } } else { // old result write(key, value); } }
  • 25.
    Overwrite Installation void reduce(key,values) { sum = 0; for(value : value) { sum += value; } put = new Put(key); put.add("fam", "col", sum); context.write(key, put); }
  • 26.
    Increment Installation void reduce(key,values) { sum = 0; for(value : value) { sum += value; } inc = new Increment(key); inc.add("fam", "col", sum); context.write(key, inc); }
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 32.
    Kernfunktionalität: Verteiltes Rechnen mittelsMapReduce Ich kümmere mich um: IncDec, Overwrite, Lesen alter Ergebnisse, Erzeugung von Inkrementen,… Ich erkläre dir, wie man die Eingabedaten liest, Kernfunktionalität: aggregiert, invertiert und die Ausgabe schreibt. Inkrementelles Rechnen
  • 33.
    public class WordTranslatorextends Translator<LongWritable, Text> { public void translate(…) { … } IncJob job = new IncJobOverwrite(conf); job.setTranslatorClass( WordTranslator.class); job.setAbelianClass(WordAbelian.class);
  • 34.
    public class WordAbelianimplements Abelian<WordAbelian> { WordAbelian invert() { … } WordAbelian aggregate(WordAbelian other) { … } WordAbelian neutral() { … } boolean isNeutral() { … } Writable extractKey() { … } void write(…) { … } void readFields(…) { … } }
  • 35.
    public class WordSerializer implements Serializer<WordAbelian> { Writable serialize(Writable key, WordAbelian v) { … } WordAbelian deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value) { … } }
  • 36.
    How To WriteA Marimba-Job 1. Abelian-Klasse 2. Translator-Klasse 3. Serializer-Klasse 4. Hadoop-Job schreiben unter Verwendung der Klasse IncJob
  • 37.
    Implementierung setInputTable(…) IncJob setOutputTable(…) IncJobFull IncJobIncDec IncJobOverwrite Recomputation setResultInputTable(…)
  • 38.
  • 39.
    public interface Abelian<Textends Abelian<?>> extends WritableComparable<BinaryComparable>{ T invert(); T aggregate(T other); T neutral(); boolean isNeutral(); Writable extractKey(); void write(…); void readFields(…); }
  • 40.
    public interface Serialzer<Textends Abelian<?>> { Writable serialize(T obj); T deserializeHBase( byte[] rowId, byte[] colFamily, byte[] qualifier, byte[] value); }
  • 41.
    public abstract classTranslator <KEYIN, VALUEIN> { public abstract void translate (KEYIN key, VALUEIN value, Context context); this.mapContext.write( abelianValue.extractKey(), this.invertValue ? abelianValue.invert() : abelianValue);
  • 42.
    GenericMapper Eingabeformat liefert: Value OverwriteResult InsertedValue DeletedValue PreservedValue deserialisieren weitergeben setze invertValue=true; ignorieren weitergeben
  • 43.
    GenericReducer 1. Aggregieren 2. Serialisieren 3. Schreiben IncDec: Overwrite: PUT → schreiben putToIncrement(…) IGNORE → nicht schreiben DELETE → putToDelete(...)
  • 44.
    GenericCombiner „Write A Combiner“ -- 7 Tips for Improving MapReduce Performance, (Tipp 4) 1. Aggregieren
  • 45.
  • 46.
    Beispielanwendungen: 1. WordCount void translate(key, value) { WordAbelian invert() { return new WordAbelian( for(word : value.split(" ")) { this.word, write( -1 * this.count); new WordAbelian(word, 1)); } } } WordAbelian aggregate( WordAbelian other) { return new WordAbelian( Writable serialize( this.word, WordAbelian w) { this.count Put p = new Put( + other.count); w.getWord()); } p.add(…); return p; boolean neutral() { } return new WordAbelian( this.word, 0); } boolean isNeutral() { Translator } return (this.count == 0); Serializer WordAbelian
  • 47.
    Beispielanwendungen: 2. Friends Of Friends FRIENDS A D B FRIENDS OF FRIENDS C E
  • 48.
    Beispielanwendungen: 2. Friends Of Friends translate(person, friends): aggregate(…): Menge der Freundes- freunde mischen
  • 49.
    Beispielanwendungen: 3. Reverse WebLink-Graph REVERSE WEB LINK GRAPH (Row-ID -> Columns) Google -> {eBay, Wikipedia} aggregate(…): -> {Google, Wikipedia} eBay Menge der LinksMensa-KL -> {Google} mischen Facebook -> {Google, Mensa- KL, Uni-KL} Wikipedia -> {Google} Uni-KL -> {Google, Wikipedia}
  • 50.
    Beispielanwendungen: 4. Bigrams Hi, kannst du mich ___?___ am Bahnhof abholen? So in etwa 10 ___?___. Viele liebe ___?__. P.S. Ich habe viel ___?___.
  • 51.
    Idee: Große Mengenvon Text mit MapReduce analysieren.
  • 54.
    Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 55.
    Beispielanwendungen: 4. Bigrams extractKey() a invert(): count*=-1 b NGramAbelian count aggregate(… other): write(…) count+=other.count neutral(): isNeutral(): count=0 count==0 NGramStep2Abelian
  • 56.
  • 57.
    bitte Hi, kannst dumich ___ ___ am nicht Bahnhof abholen? So in etwa <num>Minuten <num> Jahre 10 ___ ___. Grüße dich Viele liebe ___ __. zu Spaß P.S. Ich habe viel ___ ___.
  • 58.
    Grundlagen & Das Marimba-Framework Evaluation Vorangegangene Arbeiten
  • 59.
    WordCount 01:10 01:00 00:50 00:40 Zeit [hh:mm] 00:30 FULL INCDEC OVERWRITE 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 60.
    Reverse Weblink-Graph 02:51 02:41 02:31 02:21 02:11 02:00 01:50 01:40 Zeit [hh:mm] 01:30 FULL 01:20 INCDEC 01:10 OVERWRITE 01:00 00:50 00:40 00:30 00:20 00:10 00:00 0% 10% 20% 30% 40% 50% 60% 70% 80% Änderungen
  • 61.
  • 62.
    Verwendete Grafiken Folie 5-9: Folie 37-44: Flammen und Smilie: Microsoft Office 2010 Puzzle: http://www.flickr.com/photos/dps/136565237/ Folie 10: Folie 46 - 48: Google: http://www.google.de Junge: Microsoft Office 2010 Folie 11: Folie 49: Amazon: http://www.amazon.de Google: http://www.google.de eBay: http://www.ebay.de Folie 12: Mensa-KL: http://www.mensa-kl.de Hadoop: http://hadoop.apache.org facebook: http://www.facebook.de Casio Wristwatch: Wikipedia: http://de.wikipedia.org http://www.flickr.com/photos/andresrueda/3448240252 TU Kaiserslautern: http://www.uni-kl.de Folie 16: Folie 50-51: Hadoop: http://hadoop.apache.org Handy: Microsoft Office 2010 Folie 17: Folie 56: Hadoop: http://hadoop.apache.org Wikipedia: http://de.wikipedia.org Notebook: Microsoft Office 2010 Twitter: http://www.twitter.com Folie 18: Folie 57: HBase: http://hbase.apache.org Handy: Microsoft Office 2010 Folie 31: Folie 58: Hadoop: http://hadoop.apache.org Hadoop: http://hadoop.apache.org Casio Wristwatch: Casio Wristwatch: http://www.flickr.com/photos/andresrueda/3448240252 http://www.flickr.com/photos/andresrueda/3448240252 Folie 32: Gerüst: http://www.flickr.com/photos/michale/94538528/ Hadoop: http://hadoop.apache.org Junge: Microsoft Office 2010
  • 63.
    Literaturverzeichnis (1/2) [0] JohannesSchildgen. Ein MapReduce-basiertes Programmiermodell für selbstwartbare Aggregatsichten. Masterarbeit, TU Kaiserslautern, August 2012 [1] Apache Hadoop project. http://hadoop.apache.org/. [2] Virga: Incremental Recomputations in MapReduce. http://wwwlgis.informatik.uni-kl.de/cms/?id=526. [3] Philippe Adjiman. Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner,2010. http://www.philippeadjiman.com/blog/2010/01/14/hadoop-tutorial-series-issue-4-to-use-or-not-to-use-a-combiner/. [4] Kai Biermann. Big Data: Twitter wird zum Fieberthermometer der Gesellschaft, April 2012. http://www.zeit.de/digital/internet/2012-04/twitter-krankheiten-nowcast. [5] Julie Bort. 8 Crazy Things IBM Scientists Have Learned Studying Twitter, January 2012. http://www.businessinsider.com/8-crazy-things-ibm-scientists-have-learned-studying-twitter-2012-1. [6] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clus-ters. OSDI, pages 137–150, 2004. [7] Lars George. HBase: The Definitive Guide. O’Reilly Media, 1 edition, 2011. [8] Brown University Data Management Group. A Comparison of Approaches to Large-Scale Data Analysis. http://database.cs.brown.edu/projects/mapreduce-vs-dbms/. [9] Ricky Ho. Map/Reduce to recommend people connection, August 2010. http://horicky.blogspot.de/2010/08/mapreduce-to-recommend-people.html. [10] Yong Hu. Efficiently Extracting Change Data from HBase. April 2012. [11] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Can mapreduce learnform materialized views? In LADIS 2011, pages 1 – 5, 9 2011. [12] Thomas Jörg, Roya Parvizi, Hu Yong, and Stefan Dessloch. Incremental recomputations in mapreduce. In CloudDB 2011, 10 2011. [13] Steve Krenzel. MapReduce: Finding Friends, 2010. http://stevekrenzel.com/finding-friends-with-mapreduce. [14] Todd Lipcon. 7 Tips for Improving MapReduce Performance, 2009.http://www.cloudera.com/blog/2009/12/7-tips-for-improving- mapreduce-performance/.
  • 64.
    Literaturverzeichnis (2/2) [15] TouchTypeLtd. SwiftKey X - Android Apps auf Google Play, February 2012. http://play.google.com/store/apps/details?id=com.touchtype.swiftkey. [16] Karl H. Marbaise. Hadoop - Think Large!, 2011. http://www.soebes.de/files/RuhrJUGEssenHadoop-20110217.pdf. [17] Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed Cube Materia-lization on Holistic Measures. ICDE, pages 183–194, 2011. [18] Alexander Neumann. Studie: Hadoop wird ähnlich erfolgreich wie Linux, Mai 2012. http://heise.de/-1569837. [19] Owen O’Malley, Jack Hebert, Lohit Vijayarenu, and Amar Kamat. Partitioning your job into maps and reduces, September 2009. http://wiki.apache.org/hadoop/HowManyMapsAndReduces?action=recall&#38;rev=7. [20] Roya Parvizi. Inkrementelle Neuberechnungen mit MapReduce. Bachelorarbeit, TU Kaiserslautern, Juni 2011. [21] Arnd Poetzsch-Heffter. Konzepte objektorientierter Programmierung. eXamen.press. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. [22] Dave Rosenberg. Hadoop, the elephant in the enterprise, June 2012. http://news.cnet.com/8301-1001 3-57452061-92/hadoop-the-elephant-in-the-enterprise/. [23] Marc Schäfer. Inkrementelle Wartung von Data Cubes. Bachelorarbeit, TU Kaiserslautern, Januar 2012. [24] Sanjay Sharma. Advanced Hadoop Tuning and Optimizations, 2009. http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation. [25] Jason Venner. Pro Hadoop. Apress, Berkeley, CA, 2009. [26] DickWeisinger. Big Data: Think of NoSQL As Complementary to Traditional RDBMS, Juni 2012. http://www.formtek.com/blog/?p=3032. [27] Tom White. 10 MapReduce Tips, May 2009. http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/.60