Big Data, Data and Information Mining for Earth Observation

707 views
574 views

Published on

Big Data, Data Mining, Information Mining for Earth Observation

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
707
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • European independence in data sources for environment and security monitoringEuropean independence & contribution to global observing system Copernicus is a European system for monitoring the Earth. Copernicus consists of a complex set of systems which collect data from multiple sources: earth observation satellites and in situ sensors such as ground stations, airborne and sea-borne sensors. It processes these data and provides users with reliable and up-to-date information through a set of services related to environmental and security issues.
  • TheSentinel-Satellites(S1A/B, S2A/B, S3A/B, S4A/B and S5 Precursor) are under development, S-5 and Jason-CS are under definition Satellite launchesas from beginning 2014 Theground segment (data reception, processing and dissemination)is getting ready for Sentinel launchesEUis responsible for Copernicus overall and for servicesESAis responsible forthe Space Component
  • Available today or planned at European, national and international levelDeveloped for other purposes but making important data available for CopernicusList not exhaustive (+ Seosar, SPOT-6/-7, TanDEM-X, EnMAP, Venμs, Altika, Deimos2, etc. )… will evolve based on service requirements and mission availabilities
  • Big Data, Data and Information Mining for Earth Observation

    1. 1. Image Information Mining and Knowledge Discovery from Earth Observation Data Towards the Sentinels Era P.G. Marchetti ESA, M. Iapaolo Randstad Ground Segment and Mission Operations Department Research and Ground Segment Technology Section Earth Observation Programmes Directorate pier.giorgio.marchetti@esa.int michele.iapaolo@esa.int International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    2. 2. Outline 1. Background on the European Space Agency 2. Motivation 3. Overview of ESA activities in the IIM field 4. Systems and services for EO data exploitation 5. The road ahead International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    3. 3. The Heritage: ERS and ENVISAT • ERS and Envisat missions 1991-2012 • More than 2 Petabytes of data • Two decades of global change records • Need for data preservation, availability and exploitation ESA UNCLASSIFIED – For Official Use International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    4. 4. Ten Years of Envisat Science 5000 scientific projects using Envisat data Iceland 2010 Ozone hole 2005 Arctic 2007 First images L’Aquila 2009 Global air pollution B-15A iceberg Chlorophyll concentration Japan 2011 Bam earthquake Prestige tanker oil slick CO2 map Launch Hurricane Katrina Envisat Symposium Salzburg (A) Mar 02 Sep 04 Envisat was the Sentinel “precursor” for many operational users Envisat Symposium Montreux (CH) Apr 07 Living Planet Symposium Bergen (N) Jun 10 Living Planet Symposium Edinburgh (UK) Sep 13 and many workshops dedicated to specific Envisat user communities ESA UNCLASSIFIED – For Official Use International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    5. 5. The Copernicus Programme Copernicus (formerly known as GMES) is a European space flagship programme led by the European Union Space Component Provides the necessary data for operational monitoring of the environment and for civil security ESA coordinates the space(*) component (*)spacecraft, flight operation segment, ground segment ESA UNCLASSIFIED – For Official Use In-Situ Component Services Component 6 International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    6. 6. Copernicus Space Component: Dedicated Missions S1A/B: Radar Mission S2A/B: High Resolution Optical Mission S3A/B: Medium Resolution Imaging and Altimetry Mission S4A/B: Geostationary Atmospheric Chemistry Mission S5P: Low Earth Orbit Atmospheric Chemistry Precursor Mission S5A/B/C: Low Earth Orbit Atmospheric Chemistry Mission Jason-CS A/B: Altimetry Mission 7 ESA UNCLASSIFIED – For Official Use International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    7. 7. Copernicus Contributing Missions COSMO-Skymed SPOT (VGT) TerraSAR–X Tandem-X PROBA-V Radarsat DMC Pléiades Copernicus Contributing Missions Cryosat Deimos-2 RapidEye Jason Atmospheric missions SPOT (HRS) MetOp ESA UNCLASSIFIED – For Official Use Meteosat 2nd Generation International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    8. 8. Motivation 1. Foster the use of IIM and derived technologies in support of the EO data exploitation 2. Develop state-of-the-art data processing for improving access and dissemination of future EO data (e.g. Sentinels mission) 3. Implement systems and services for supporting the “scientific exploitation” of EO data 4. Investigate new approaches and methodologies to exploit data from all available missions and archives (joint effort with Long Term Data Preservation programme) International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    9. 9. Image Information Mining Coordination Group (IIMCG) The Image Information Mining Coordination Group (IIMCG): Space Agencies (ESA, DLR, CNES, ASI) European Institutions (EUSC, JRC)  National Research Institutes (Uni-Trento, ETHZ, INGV, Mississippi State University)   Main objectives:  Inform Agencies and partners, promote research and technological activities on IIM (automatic information extraction from EO data for image understanding and retrieval)  Promote the use of IIM techniques for management and exploitation of very large EO data archives/missions (PB of data)  Foster the role of IIM in the context of future missions and existing archives  Involve industry and agency partners to increase the relevance of IIM activities in Europe International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    10. 10. 10 years of IIM activities at ESA 2000 2002 2004 2006 2008 2010 2012 Knowledge Driven Information Mining Prototype Knowledge-centred Earth Observation Prototype Multi-sensor Evolution Analysis Prototype  Technology activities over last decade  Main achievements:     KIM System: IIM reference prototype @ ESRIN Platforms for EO data exploitation (KEO, GPOD, SSE, etc.) Tools for multi-temporal and evolution analysis (MEA) Issues:    Limited number of scientific and industrial partners involved National efforts not coordinated and harmonised Funds limited wrt the size of the research goals International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    11. 11. Image Information Mining From Data to Information Data (PB) Acquisition EO Data Catalogue & Ordering Image Information Mining (IIM) Information Algorithms & Applications Knowledge Models & Ground Truth Information (KB)  Provides processing tools to extract features from images and associate meaning to extracted features (bridging the gap between data and information)  Empower users (researchers, service providers, decision makers) to identify and reuse relevant information for their applications  Encourage the use of common cooperative environments to achieve a common knowledge International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    12. 12. KIM (Knowledge-based Information Mining) The KIM prototype developed @ ESRIN permits:  Intelligent and effective access to information in large EO datasets  Improved exploration and use of EO images for scientific research  Extraction of relevant information for different applications (change detection, global monitoring, disaster management, …)  Implementation, integration and validation of services derived from IIM methods Three main components: 1. Ingestion Software (Primitive Feature Extraction / Clustering) 2. Database (storing extracted information) 3. Interactive Client Application i. ii. iii. iv. Training and definition of “semantic rules” Application of training (rules) to the entire collection Definition of “semantic labels” for extracted information Store for successive re-use International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    13. 13. KIM Architectural Elements Input  EO images Output  Identifiers of searched images  Feature Maps / Thematic maps laveirteR egam I desaB tnetnoC laveirteR egam I desaB tnetnoC KIM seirotisopeR ataD seirotisopeR ataD Ingestion Feature Extraction Clustering Database XUA XUA noitacifissalC desivrepusnU nInformation usnU oitacifissalC desivrep Mining EO Images seires emiT ssiisyls na iT e re a em sisylana International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013 Client
    14. 14. KIM Search and label KIM permits to inspect a collection of images… …interactively define “semantic features” using the “primitive features” extracted by the system… …search for the defined feature within the entire collection… International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    15. 15. KIM Information Extraction …and extract Feature Maps or Thematic Maps Cloud masks Flooded areas International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013 Forest Monitoring
    16. 16. KIM Primitive features Spectral Spectral signature Texture Structural information extracted with the Gibbs Marcov Random Fields (GMRF) model S0 - full resolution images; S1 - sub-sampled images DCT Discrete Cosine Transform: transforms signals and images from the spatial domain to the frequency domain EMBD Enhanced-Model-Based-Despeckling: performs a high quality despeckling of SAR images Area Area of the objects detected with the segmentation process Compactness Compactness of the objects detected with the segmentation process Spectral Mean Mean value of the radiometric information of the image inside the closed area detected by the segmenter Spectral Variance Variance of the radiometric information of the image inside the closed area detected by the segmenter Hu Moments Hu-Moment Invariants: shape information conveyed by the contour points. Hu moments are invariant to scale, rotation and translation (the first 4 out of 7 invariant moments as shape descriptors have been used). International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    17. 17. KIM Validation KIM has been tested and validated with different datasets: 1. MERIS RR / MERIS FR 2. ERS / ASAR 3. SPOT 4. Landsat 5. Maps (Level 2 / Level 3 products) Large number of collection created Low number of significant semantic features identified International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    18. 18. KIM for Information Extraction 1. Flood Detection (SAR data) 2. Cloud Detection (MERIS RR) 3. Long-term Forest Monitoring (Landsat) 4. Rapid Mapping / Damage Assessment (VHR optical data) Potentialities of the tool have been highlighted and confirmed in different contexts End-users expectations not always achieved International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    19. 19. KEO (Knowledge centred Earth Observation) KEO is a distributed Component-based Processing Environment (CPE) permitting to: a. Create & semantically identify internal/external Processing Components b. Graphically chain Processing Components into processing chains c. Create Processing Components from IIM components (KIM training) d. Export and store outputs into Web Servers (WFS, WMS, WCS) KEO also provides some relevant Reference Data Sets: a. Heterogeneous data and information, growing with external contributions (images, documents, DEMs, photos, processors, etc.) b. In support of various applications: Classification, Time Series Analysis, Ortho-rectification, Urban Monitoring, Interferometry, etc.) International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    20. 20. KEO CPE Graphical Processor Designer International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    21. 21. MEA (Multi-temporal Evolution Analysis) Multi-temporal analysis of HR / VHR products: 1. Select multi-temporal applications that might benefit from such extension 2. Design, implement and integrate the automatic multi-temporal algorithms to support the selected applications 3. Create the needed HR/VHR Reference Data Sets and Evolution Models 4. Develop standard interfaces between the different systems for common exploitation of ingested data and processing capabilities 5. Integrate algorithms and Evolution Models provided by other independent projects 6. Validate (with the support of a Validation Group) the Automatic Multitemporal algorithms and Evolution Models International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    22. 22. MEA (Multi-temporal Evolution Analysis) The MEA-ASIM system aims at providing: 1. Advanced tools for Land Use / Land Cover change analysis 2. Level-2 EO products for real time exploitation 3. Interfaces to external systems (G-POD, KEO, data providers, etc.) 4. Access to data via standard WCS OGC interface RSS Data Farm 5. Native support for Sentinel-2 datasets International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    23. 23. MEA Pixel and Coverage analysis Time-Series Analisys Single and multi plot functionality International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    24. 24. MEA Pixel and Coverage analysis Cross-comparison of EO products International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    25. 25. Exploitation Platforms for EO Development and implementation of collaborative Exploitation Platforms (G-POD, SSEP, E-CEO, etc.): 1. Fostering the scientific exploitation of EO data 2. Automating the creation data mining and information extraction experiments and algorithms 3. Supporting the creation of EO-based applications and services 4. Supporting the entire scientific research process: a. Addressing specific scientific challenges and tackling new research problems in a “parallel and collaborative way” b. Generation of reproducible results that can be easily shared and validated International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    26. 26. Research and Service Support: Research Process International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    27. 27. RSS G-POD Process Steps Principal Investigators • • RSS EO algorithms delivery Data type and range indication • Data are made available in the RSS catalogue • Algorithm porting and Integration • Output validation • Test and validation (involving the PI) • On-demand EO data processing • On-demand EO data processing • Use of produced data (scientific projects delivery) • Delivery • Publications International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    28. 28. RSS Flexible Resources On-demand processing service: Process EO data delivery G-POD EO Scientists Principal Investigators Platform Volume accessed by PI projects in 2012: Infrastructure • Total Number Submitted Jobs 38,774 • Average Number of Products per Job: 35 • Average Product Size: 700 MB • Total Size Data Processed: 906 TB ESRIN - 172 cores - 400 TB UK-PAC - 96 cores - 300 TB International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) Flexible/ Unlimited Infrastructure - 10-200 cores - 1-10 TB 26/11/2013 Flexible Infrastructure satisfies: HW requirements Connectivity requirements SLA (HA, help desk, ticketing systems, etc.)
    29. 29. RSS Facts & Figures On-demand Processing: actual figures in the last 3 years •Supported more than 40 active users per year •Supported >20 processing/re-processing campaigns (included entire missions, e.g. MERIS, ASAR, SMOS and TPM) •Integrated ~10 new algorithms per year •Upgraded ~15 algorithms per year •Set-up flexible (additional) processing capacity in less than 2 working days •Managed >450 TB data farm (ESA, TPM and scientific products) • ESA – ENVISAT (~320TB), ERS (~50TB), SMOS (~10TB) • TPM – MSG (~19TB), METOP (~11TB), ALOS (~2TB) • Scientific products – AARDVARC Swansea University and MGVI JRC (produced by GPOD and distributed via SSE), MKL3 ACRI International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    30. 30. SMOS Testbed Service Purpose SMOS Testbed aims to provide a flexible test environment to support the ESA calibration team for L1 calibration, and the Expert Support Laboratories (ESLs) for L2 Soil Moisture and Ocean Salinity pre-validation. G-POD support elements – Fast integration of new versions – SMOS L0 NRT ingestion chain set-up for L1 NRT custom re-processing – Access to online data for bulk re-processing – Access to flexible cloud resources for meeting deadlines – On-demand SMOS L1 and L2 processors available for SMOS Teams SMOS New Processor Delivery Processor Integration in G-POD G-POD Processing Campaign Results Analysis and Validation Auxiliary And Calibration Datasets ESA UNCLASSIFIED – For Official Use International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    31. 31. SMOS Testbed SMOS L1 TESTBED Processor:s Calibration, Telemetry, Level 1A, 1B, 1C Supported versions: 3.46, 5.00, 5.01, 5.02, 5.03, 5.04, 5.05, 6.00, 6.01 Reference data series: L0, L1A, L1B, L1C (reprocessed) Auxiliary data baseline: as per Operational environment ESA UNCLASSIFIED – For Official Use SMOS L2 SOIL MOISTURE TESTBED SMOS L2 OCEAN SALINITY TESTBED Processor: SM L2, SM L2 postprocessing Supported versions: 4.00, 4.01 Reference data series: L1C (reprocessed) Auxiliary data baseline: as per CESBIO reprocessing Processor: OS L2 Supported versions: 5.00, 5.50 Reference data series: L1C (reprocessed) Auxiliary data baseline: as per Operational environment International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    32. 32. The road ahead (1) Funds for a full scale research programme are needed to foster:  The widening of competence and expertise in several research centres/industrial actors in Europe  The widening of efforts to cover time series analysis and data analytics in general  Research and development of multi dimensional and scalable DB solutions (including nosql databases, hadoop, etc.)  Large collaborative and persistent effort on crowdsourcing, benchmarking, image and feature annotation and evaluation  Establishing a theoretical framework to bridge the semantic gap and be able to assign “discriminating power” to extracted features and “categorization” of extracted classes/objects  High quality software and algorithm developments able to reach at least the “software prototype” readiness level International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    33. 33. The road ahead (2) To achieve these goals it is necessary to:  Establish a common “Big Data Mining” framework with interdisciplinary partners  Establish a R&D network to sustain this field  Establish a network of users, and give them access to IIM resources (system, data, …)  Enlarge the scope of “Image” Mining to the physical parameters measured by EO instruments  Address the “instrument” gap, instrument-application  Develop methods to use heterogeneous data: in situ, metadata, linked data, models, etc. International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    34. 34. The road ahead (3)  Activities to be started:  Promote IIM technology acceptance for EO users  Extend and adapt methods from multimedia and social nets  Apply human computing, gather knowledge from the use of the system, adaptation, personalization, etc.  Focus on Web/Internet based systems  Develop simple and specific HMI and GUI  Focus on Visual Data Mining, Visual Analytics, and related methods  In the PDGS identify “long term data preservation” and “interactive data exploitation” components  Design data representations: actionable information International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    35. 35. The road ahead (4) Merge the best of : • data mining approach • time series capability • ability to support and host the user algorithm International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
    36. 36. MANY THANKS! ESA UNCLASSIFIED – For Official Use International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

    ×