15 minute presentation about Thesis
Upcoming SlideShare
Loading in...5
×
 

15 minute presentation about Thesis

on

  • 352 views

 

Statistics

Views

Total Views
352
Views on SlideShare
352
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

15 minute presentation about Thesis 15 minute presentation about Thesis Presentation Transcript

  • Too much Data! Sven MeysSaturday 9 February 13
  • Onderwerp On-demand Information Extraction from Remote Sensing Images with MapReduceSaturday 9 February 13
  • Inhoud • Context • Literatuurstudie • PlanningSaturday 9 February 13 View slide
  • Context • VITO • Remote Sensing • Probleemstelling • OnderzoeksvragenSaturday 9 February 13 View slide
  • 16% 700 €103 Milj. 84% Government PrivateSaturday 9 February 13
  • Energy Industrial Innovation Quality of Environment Environ- mental Separation Transition Material Remote Environ- Environ- Energy Analysis & Energy & Techno- Sensing mental mental Technology & Conversion Environment logy Modelling Health Techno- Technology logySaturday 9 February 13
  • Context • VITO • Remote Sensing • Probleemstelling • OnderzoeksvragenSaturday 9 February 13
  • Saturday 9 February 13
  • Saturday 9 February 13
  • Remote SensingSaturday 9 February 13
  • 2 1 km per pixel 0.5 miljard pixels 1.2 GBSaturday 9 February 13
  • RS ToepassingenSaturday 9 February 13
  • Time Series: 01-01-2001 01-01-2012 Algorithm: NDVI Output: Mean SUBMITSaturday 9 February 13
  • Context • VITO • Remote Sensing • Probleemstelling • OnderzoeksvragenSaturday 9 February 13
  • Probleemstelling Betere beelden Betere sensoren Meer informatie Duurdere opslag Meer data Data Transport Dure supercomputersMeer rekenwerk Parallel ProcessingSaturday 9 February 13
  • Doelstellingen • Snel genoeg • Betaalbaar • Schaalbaar Bestandssysteem + Software frameworkSaturday 9 February 13
  • Onderzoeksvragen • Hoe kunnen grote satellietbeelden in een HDFS filesysteem opgeslagen worden zodat ze op een efficiënte manier in parallel verwerkt kunnen worden? • Welke algoritmes kunnen gebruikt worden met deze opslagtechniek en MapReduce?Saturday 9 February 13
  • Inhoud • Context • Literatuurstudie • PlanningSaturday 9 February 13
  • Literatuurstudie • Interessante projecten • HDFS • MapReduce • Implementaties • Distributies • Huidige LiteratuurSaturday 9 February 13
  • Interessante projecten • NA (12) • Center for Climate Simulation • Square Kilometer Array: 700 TB/sec • Open Cloud Consortium(13) • Project Matsu: Elastic Clouds for Disaster Relief • : Large Hadron Collider (14) • 20 PB/jaarSaturday 9 February 13
  • HDFS 1 • Gedistribueerd bestandssysteem 2 ... • Gebaseerd op the Google File System(1) ... n • Grote blokken (128 MiB) • Commodity hardware • Falen = standaard • Read & append (1)Saturday 9 February 13
  • A DFS usually accounts for transparent file replication and fault to HDFSbles data locality for processing tasks. A DFS does this by subdividin ese blocks within a cluster of computers. Figure 2 shows the distrib of a file (left) subdivided into three blocks. 1 1 3 1 2 2 3 3 2 2 3 1 Figure 2: File blocks, distribution and replication in a distributed file system Saturday 9 February 13
  • onsult GmbH HDFS Ca 1 1 3 1 2 2 2 3 3 3 2 1 Figure 4: Block assembly for data retrieval from the distributed file systemSaturday 9 February 13
  • rates how the file system handles node-failure by automated recov HDFS HDFS further uses checksums to verify block integrity. As long as thccessible copy of a block, it can automatically re-replicate to returntion rate. 1 1 1 1 3 3 2 3 2 3 2 3 3 2 2 2 2 3 1 1Figure 3: Automatic repair in case of cluster node failure by additional replication Saturday 9 February 13
  • HDFS - Overzicht • Schaalbaar • Snel lezen/schrijven • Robuust • Factor 10 goedkoper (2)Saturday 9 February 13
  • MapReduceSaturday 9 February 13
  • MapReduce - WordCountSaturday 9 February 13
  • MapReduce - Overzicht • Based on Google MapReduce (3) • Data Locality • Key/Value pairs • Zeer snel • Andere manier van denkenSaturday 9 February 13
  • Implementaties Hadoop Stratosphere HPCC Support + - + Extensions + - ? Community +++ +/- - Target ANY EDU BI • Apache Software Foundation • Anderen: outdated, commercieel, weinig support (4-6)Saturday 9 February 13
  • Distributies (8) • Hortonworks (7) • • Cloudera : Cloudera Manager (9) • Web Interface • 1-Click install. (yeah right...) • Interessant licentie modelSaturday 9 February 13
  • Algemeen • Vooral tekstverwerking • Voor kleine afbeeldingen (10) • Weinig detail • Commercieel (11)Saturday 9 February 13
  • Inhoud • Context • Literatuurstudie • PlanningSaturday 9 February 13
  • Planning literatuur fase 1 fase 2 fase 3 fase 4 01 01 15 20 /09 /02 / 03 /0 5 verslag stage vandaag inleveren masterproefSaturday 9 February 13
  • Fase 1 - Done Sven Workstation Workstation Workstation 192.168.10.248 TT DN Master Bruno Tim Patrick JT TT TT TT NN DN DN DN 192.168.10.245 192.168.10.246 192.168.10.247 192.168.10.249 JT = Job Tracker = Name Node NN = RedHat 6.2 = RedHat 6.2 Workstation Virtual Machine TT = Task Tracker DN = Data NodeSaturday 9 February 13
  • Fase 2 • Eenvoudig algoritme • Beeld draaien • Standaard IO • HDFSSaturday 9 February 13
  • Fase 3 • Meer complexiteit: MapReduce • Spatiaal: Convolutiemasker, ROI • Temporeel/Spectraal: Meerdere afbeeldingen •Saturday 9 February 13
  • Fase 4 • Performantie in functie van pixel afstandSaturday 9 February 13
  • Planning literatuur fase 1 fase 2 fase 3 fase 4 01 01 15 20 /09 /02 / 03 /0 5 verslag stage vandaag inleveren masterproefSaturday 9 February 13
  • The End • Veel data • Anders denken • Veel mogelijkheden • RLZ of nieuw keuzevak Big Data? ;) • Mapreduce + OpenCL? • Veel uitdagingen • Veel vragenSaturday 9 February 13
  • Referenties (1) Ghemawat, S., Gobioff, H. and Leung, S.-T. (2003), ‘The google file system’ (2) Krishnan, S., Baru, C. and Crosby, C. (2010), ‘Evaluation of mapreduce for gridding lidar data’ (3) Dean, J., Ghemawat, S. and Inc, G. (2004), ‘Mapreduce: simplified data processing on large clusters’ (4) http://hadoop.apache.org/ (5) Warneke, D. and Kao, O. (2009), ‘Nephele: Efficient parallel data processing in the cloud’, http://www.stratosphere.eu (6) http://hpccsystems.com/ (7) http://hortonworks.com/ (8) http://mapr.com/ (9) http://cloudera.com/ (10) Sweeney, C. (2011), ‘Hipi: Hadoop image processing interface for image-based mapreduce’ (11) Guinan, O. (2011), ‘Indexing the earth - large scale satellite image processing using hadoop’, http://www.cloudera.com/content/ cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-indexing-the-earth-large-scale-satellite-image- processing-using-hadoop.htmt (12) Q. Duffy, D. (2013), ‘Untangling the computing landscape for NASA climate simulations’. URL: http://www.nas.nasa.gov/ SC12/demos/demo20.html (13) http://www.slideshare.net/rgrossman/project-matsu-elastic-clouds-for-disaster-relief (14) Lassnig, M., Garonne, V., Dimitrov, G. and Canali, L. (2012), ‘Atlas data management accounting with hadoop pig and hbase’.Saturday 9 February 13