SuperAI on Supercomputers
Haohuan Fu
haohuan@tsinghua.edu.cn
High Performance Geo-Computing (HPGC) Group
http://www.thuhpgc.org
n Create a digital earth,
so as to:
p simulate
p analyze
p understand
p predict and mitigate
Earth Science and Supercomputers
figure credit: Earth Simulator, JPN
Simulation Data Analysis
Two Major Functions
3
To design highly efficient and
highly scalable simulation
applications
To develop intelligent data
mining methods for the
analysis of BIG scientific DATA
Two Major Functions
4
To design highly efficient and
highly scalable simulation
applications
To develop intelligent data
mining methods for the
analysis of BIG scientific DATA
Two Major Functions
5
2015-now: The CESM Project on Sunway TaihuLight
6
CAM5.0 POP2.0
CLM4.0 CICE4.0
CPL7
CESM1.2.0
Tsinghua + BNU 30+ Professors and Students
• Four component models, millions lines of code
• Large-scale run on Sunway TaihuLight
• 24,000 MPI processes
• Over one million cores
• 10-20x speedup for kernels
• 2-3x speedup for the entire model
Haohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang and et al., “Refactoring and Optimizing the Community
Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer”, The International Conference for High
Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, Utah, US, 2016
2017: Non-Linear Earthquake Simulation
Source Partitioner
Restart
Controller
3D Model Interpolator
LZ4 Compression, Group I/O, Balanced I/O Forwarding
Snapshot/Sesimo
Recorder
Velocity
Update
Stress Update
Source
Injection
Stress
Adjustment For
Plasticity
Dynamic Rupture Source Generator
(Based on CG-FDM)
Seismic Wave Propagation
(Based on AWP-ODC)
Next Timestep
3D Vel/Den Model
Fault Stress
Init
Friction
Law Ctrl
Wave Eqn
Solver
2018: Simulating the Wenchuan Earthquake with
Accurate Surface Topography
Sunway (2017)
Tangshan
(a)
5km
Sunway (2018)
Wenchuan
(b)
Surface
Topography
x
yz
! = !($, &, ')
) = )($, &, ')
* = *($, &, ')
$
&'
(1) (2)
−T-
−T.
T.
T-
/& = 0
To design highly efficient and
highly scalable simulation
applications
To develop intelligent data
mining methods for the
analysis of BIG scientific DATA
Two Major Functions
9
Data Challenge for Climate Scientists
Climate Data Challenges in the 21st Century,
Overpeck et al., Science , 331 (6018): 700-702
• Over 100 PB just for
climate change studies
• A Challenge but also
a huge opportunity
11
Remote sensing data
- Recording what is happening on the earth for decades
- What issues can we solve based on these data?
UAV
Google earth images
1984-2018
UAV image
High-resolution imageSatellite
Look Ahead: Data-Driven Modeling and Prediction
Remote
Sensing
Data Sets
Land
Cover
Mapping
ecology
protection
zone
migration
of birds
city
green land
waterbody
sea-level
13
Cat
Image Classification
Object Detection
Semantic Segmentation
Cat
Grasses
Sky Trees
Deep learning in computer vision
14
More reliable
urban land use mapping
More intelligent
oil palm monitoring
More accurate
land cover mapping
Major issues:
- How to select and
build the RS datasets?
- How to select the
most suitable method?
- How to improve the
method for our tasks?
Challenges:
- Few public labeled
datasets in RS domain
- Difference between RS
images and CV images
- Real-world applications
(small size, imbalance)
Deep learning + remote sensing data
Case 1: More Accurate Land Cover Map
First 30 m resolution global land cover maps
Gong et al., 2013, IJRSRunning SVM on 8900 Landsat scenes on a supercomputer, achieved result in one day.
FROM-GLC	(Accuracy:	63.72%)
Gong et al., 2013. IJRS
18
A Starting Point: Direct Application of Stacked AutoEncoder
RF SVM ANN SAE
Overall Accuracy 76.03% 77.74% 77.86% 78.99%
Mapping Time 33.605 ± 0.183 16344.188 4.014 ± 0.003 13.250 ± 0.042
Landsat Image RF Image SAE Image
19
Integrating Google Earth Image
Surface reflectance, NDVI, DOY
Longitude and Latitude
Elevation and Slope
Surface reflectance of max NDVI, DOY
SVM
Predicted
Result 2
Predicted
Result 1
SVM
Predicted
Result 3
Google Earth
RF
Predicted
Result 4
Spatial features from 0.5-m images
Spectral features from 30-m images
Land cover map
Landsat + DEM
24
CNN
20
Integrating Google Earth Image
21
Integrating Google Earth Image
22
Integrating Google Earth Image
23
Integrating Google Earth Image
Great increase using
Multi-resolution images
Slight increase using
30-meter resolution images
24
Integrating Google Earth Image
Fewer confusions among various land cover types in the results of our proposed method.
Google Earth image Results of RF Results of SVM Results of Ours
Cropland
Forest
Grassland
Shrubland
Water
Impervious
Bare land
Cloud
Legend
25
30m to 10m
Gong P., et al., 2019. Stable classification with limited sample: transferring a 30-m resolution sample set
collected in 2015 to mapping 10-m resolution global land cover in 2017,Science Bulletin.
26
10m to 3m
City of Changsha
削弱部分时相或局部区域厚云遮盖影像
Case 2: Detection of Oil Palm Trees
29
Case 2: Intelligent monitoring of oil palm trees
Detection and classification of
oil palm trees using UAV images
Detection of oil palm trees using
high-resolution satellite images
Mapping of oil palm trees using
high-resolution satellite images
30
CNN based large-scale oil palm tree detection
31
Semantic segmentation based oil palm plantation extraction
• We proposed the first semantic segmentation based approach for large-scale oil palm plantation extraction from
QuickBird Satellite images in 0.6-m spatial resolution.
• The enhanced pixel-wise dataset contains 36,000 images in 256×256 pixels located in southern Malaysia, including
four categories of samples: oil palm, other vegetation, buildings and the others.
• We present an end-to-end deep convolutional neural network based on the SegNet model, which has a symmetrical
encoder-decoder architecture based on the convolutional layers of the VGG-16 model.
• Our proposed approach combines the SegNet model with the Conditional Random Fields and integrates the post-
processing results with the outputs of SegNet to improve the localization of boundaries.
Input
Encoder
conv-BN-ReLU-pooling
Decoder
upsamling-conv-BN-ReLU
conv+BN+ReLU pooling
upsampling Softmax
Fully connected
CRF
Segmentation
32
CNN based large-scale oil palm tree detection
Multi-level CNN training and optimization
• The first CNN is used for land cover classification to locate the oil palm plantation area, including three types of samples (oil palm plantation
area, other vegetation / bare land, and impervious/cloud).
• The second CNN is used for object classification to identify the oil palms, including for types of samples (oil palm, background, other
vegetation / bare land, and impervious/cloud).
• The two CNNs are trained and optimized independently based on 17,000 training samples and 3000 validation samples.
CNN-2: Object classificationCNN-1: Land cover classification
Transfer Learning
One Source Domain to One Target Domain One Source Domain to Multi Target Domain
Case 3: Urban Land Use Map
35
Case 3: More accurate urban land use map
• A building extraction method based on
the U-Net semantic segmentation model.
• A data fusion method combining the
multispectral satellite images with the
public GIS map datasets.
• This work won the fifth place in the
Building Extraction Track of DeepGlobe
- CVPR 2018 Satellite Challenge.
256×256×32
128×128×64
64×64×128
32×32×256 32×32×256
16×16×512
64×64×128
128×128×64
256×256×32
Convolution + Batch Normalization + Activation
Max-poolingUpsampling Concatenation
256×256×1
GIS Map data Satellite imagery Predicted buildings
36
Case 3: More accurate urban land use map
Satellite images
Satellite images
GIS Map images
Rescaled images
Sliced images
Rescaled images
Sliced images Augmented
images
Semantic Segmentation Post-processing
Ensembling
Predicted
buildings
Probability
maps
Deep Convolutional
Neural Network
Deep Convolutional
Neural Network
Deep Convolutional
Neural Network
Deep Convolutional
Neural Network
37
Case 3: More accurate urban land use map
Results of the proposed method
Index Vegas Paris Shanghai Khartoum
TP 27526 3097 11323 3495
Precision 0.9441 0.8459 0.7470 0.6398
Recall 0.8437 0.6825 0.5396 0.4694
F1-score 0.8911 0.7555 0.6266 0.5415
Method Vegas Paris Shanghai Khartoum
Baseline 0.8611 0.6774 0.5342 0.4544
Enhanced 0.8730 0.7181 0.5471 0.4935
Postprocess 0.8866 0.7383 0.5897 0.5210
Add-map 0.8911 0.7555 0.6266 0.5415
• Our method improves the F1-scores by 3%-9% compared
with the baseline, depending on the situation of each city.
• Combining the GIS Map datasets with satellite images
improves the results for all cities, even for places with
few building information on the map.
F1-scores obtained after each strategy
KhartoumShanghai
ParisVegas
• Some examples of the building extraction results.
n Intelligent City Brain:
p Detection
p Understanding
p Modeling
p Planning
AI for City
n Li, Weijia and Dong, Runmin and Fu, Haohuan and Wang, Jie and Yu, Le and Gong, Peng, "Integrating Google Earth imagery with Landsat
data to improve 30-m resolution land cover mapping", Remote Sensing of Environment, vol. 237, pp. 111563, 2020.
n Zheng, Juepeng and Li, Weijia and Xia, Maocai and Dong, Runmin and Fu, Haohuan and Yuan, Shuai, "Large-Scale Oil Palm Tree Detection
from High-Resolution Remote Sensing Images Using Faster-RCNN", Proc. IEEE International Geoscience and Remote Sensing Symposium
(IGARSS), pp. 1422-1425, 2019.
n Dong, Runmin and Li, Weijia and Fu, Haohuan and Gan, Lin and Yu, Le and Zheng, Juepeng and Xia, Maocai, "Oil palm plantation mapping
from high-resolution remote sensing images using deep learning", to appear in International Journal of Remote Sensing, pp. 1-25, 2019.
n Xia, Maocai and Li, Weijia and Fu, Haohuan and Yu, Le and Dong, Runmin and Zheng, Juepeng, "Fast and robust detection of oil palm trees
using high-resolution remote sensing images", Proc. Automatic Target Recognition XXIX, vol. 10988, pp. 109880C, 2019.
n Dong, Runmin and Li, Weijia and Fu, Haohuan and Xia, Maocai and Zheng, Juepeng and Yu, Le, "Semantic segmentation based large-scale
oil palm plantation detection using high-resolution satellite images", Proc. Automatic Target Recognition XXIX, vol. 10988, pp. 109880D,
2019.
n Li, Weijia and He, Conghui and Fu, Haohuan and Zheng, Juepeng and Dong, Runmin and Xia, Maocai and Yu, Le and Luk, Wayne, "A Real-
Time Tree Crown Detection Approach for Large-Scale Remote Sensing Images on FPGAs", Remote Sensing, vol. 11, no. 9, pp. 1025, 2019.
n Gong, Peng and Liu, Han and Zhang, Meinan and Li, Congcong and Wang, Jie and Huang, Huabing and Clinton, Nicholas and Ji, Luyan and
Li, Wenyu and Bai, Yuqi and others, "Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to
mapping 10-m resolution global land cover in 2017", Science Bulletin, vol. 64, no. 6, pp. 370-373, 2019.
n Li, Weijia and He, Conghui and Fang, Jiarui and Zheng, Juepeng and Fu, Haohuan and Yu, Le, "Semantic Segmentation-Based Building
Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data", Remote Sensing, vol. 11, no. 4, pp. 403, 2019.
n Li, Weijia and Dong, Runmin and Fu, Haohuan and Yu, Le, “Large-scale oil palm tree detection from high-resolution satellite images using
two-stage convolutional neural networks”, Remote Sensing, vol. 11, no. 1, pp. 11, 2019.

12 SuperAI on Supercomputers

  • 1.
    SuperAI on Supercomputers HaohuanFu haohuan@tsinghua.edu.cn High Performance Geo-Computing (HPGC) Group http://www.thuhpgc.org
  • 2.
    n Create adigital earth, so as to: p simulate p analyze p understand p predict and mitigate Earth Science and Supercomputers figure credit: Earth Simulator, JPN
  • 3.
  • 4.
    To design highlyefficient and highly scalable simulation applications To develop intelligent data mining methods for the analysis of BIG scientific DATA Two Major Functions 4
  • 5.
    To design highlyefficient and highly scalable simulation applications To develop intelligent data mining methods for the analysis of BIG scientific DATA Two Major Functions 5
  • 6.
    2015-now: The CESMProject on Sunway TaihuLight 6 CAM5.0 POP2.0 CLM4.0 CICE4.0 CPL7 CESM1.2.0 Tsinghua + BNU 30+ Professors and Students • Four component models, millions lines of code • Large-scale run on Sunway TaihuLight • 24,000 MPI processes • Over one million cores • 10-20x speedup for kernels • 2-3x speedup for the entire model Haohuan Fu, Junfeng Liao, Wei Xue, Lanning Wang and et al., “Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer”, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, Utah, US, 2016
  • 7.
    2017: Non-Linear EarthquakeSimulation Source Partitioner Restart Controller 3D Model Interpolator LZ4 Compression, Group I/O, Balanced I/O Forwarding Snapshot/Sesimo Recorder Velocity Update Stress Update Source Injection Stress Adjustment For Plasticity Dynamic Rupture Source Generator (Based on CG-FDM) Seismic Wave Propagation (Based on AWP-ODC) Next Timestep 3D Vel/Den Model Fault Stress Init Friction Law Ctrl Wave Eqn Solver
  • 8.
    2018: Simulating theWenchuan Earthquake with Accurate Surface Topography Sunway (2017) Tangshan (a) 5km Sunway (2018) Wenchuan (b) Surface Topography x yz ! = !($, &, ') ) = )($, &, ') * = *($, &, ') $ &' (1) (2) −T- −T. T. T- /& = 0
  • 9.
    To design highlyefficient and highly scalable simulation applications To develop intelligent data mining methods for the analysis of BIG scientific DATA Two Major Functions 9
  • 10.
    Data Challenge forClimate Scientists Climate Data Challenges in the 21st Century, Overpeck et al., Science , 331 (6018): 700-702 • Over 100 PB just for climate change studies • A Challenge but also a huge opportunity
  • 11.
    11 Remote sensing data -Recording what is happening on the earth for decades - What issues can we solve based on these data? UAV Google earth images 1984-2018 UAV image High-resolution imageSatellite
  • 12.
    Look Ahead: Data-DrivenModeling and Prediction Remote Sensing Data Sets Land Cover Mapping ecology protection zone migration of birds city green land waterbody sea-level
  • 13.
    13 Cat Image Classification Object Detection SemanticSegmentation Cat Grasses Sky Trees Deep learning in computer vision
  • 14.
    14 More reliable urban landuse mapping More intelligent oil palm monitoring More accurate land cover mapping Major issues: - How to select and build the RS datasets? - How to select the most suitable method? - How to improve the method for our tasks? Challenges: - Few public labeled datasets in RS domain - Difference between RS images and CV images - Real-world applications (small size, imbalance) Deep learning + remote sensing data
  • 15.
    Case 1: MoreAccurate Land Cover Map
  • 16.
    First 30 mresolution global land cover maps Gong et al., 2013, IJRSRunning SVM on 8900 Landsat scenes on a supercomputer, achieved result in one day.
  • 17.
  • 18.
    18 A Starting Point:Direct Application of Stacked AutoEncoder RF SVM ANN SAE Overall Accuracy 76.03% 77.74% 77.86% 78.99% Mapping Time 33.605 ± 0.183 16344.188 4.014 ± 0.003 13.250 ± 0.042 Landsat Image RF Image SAE Image
  • 19.
    19 Integrating Google EarthImage Surface reflectance, NDVI, DOY Longitude and Latitude Elevation and Slope Surface reflectance of max NDVI, DOY SVM Predicted Result 2 Predicted Result 1 SVM Predicted Result 3 Google Earth RF Predicted Result 4 Spatial features from 0.5-m images Spectral features from 30-m images Land cover map Landsat + DEM 24 CNN
  • 20.
  • 21.
  • 22.
  • 23.
    23 Integrating Google EarthImage Great increase using Multi-resolution images Slight increase using 30-meter resolution images
  • 24.
    24 Integrating Google EarthImage Fewer confusions among various land cover types in the results of our proposed method. Google Earth image Results of RF Results of SVM Results of Ours Cropland Forest Grassland Shrubland Water Impervious Bare land Cloud Legend
  • 25.
    25 30m to 10m GongP., et al., 2019. Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017,Science Bulletin.
  • 26.
  • 27.
  • 28.
    Case 2: Detectionof Oil Palm Trees
  • 29.
    29 Case 2: Intelligentmonitoring of oil palm trees Detection and classification of oil palm trees using UAV images Detection of oil palm trees using high-resolution satellite images Mapping of oil palm trees using high-resolution satellite images
  • 30.
    30 CNN based large-scaleoil palm tree detection
  • 31.
    31 Semantic segmentation basedoil palm plantation extraction • We proposed the first semantic segmentation based approach for large-scale oil palm plantation extraction from QuickBird Satellite images in 0.6-m spatial resolution. • The enhanced pixel-wise dataset contains 36,000 images in 256×256 pixels located in southern Malaysia, including four categories of samples: oil palm, other vegetation, buildings and the others. • We present an end-to-end deep convolutional neural network based on the SegNet model, which has a symmetrical encoder-decoder architecture based on the convolutional layers of the VGG-16 model. • Our proposed approach combines the SegNet model with the Conditional Random Fields and integrates the post- processing results with the outputs of SegNet to improve the localization of boundaries. Input Encoder conv-BN-ReLU-pooling Decoder upsamling-conv-BN-ReLU conv+BN+ReLU pooling upsampling Softmax Fully connected CRF Segmentation
  • 32.
    32 CNN based large-scaleoil palm tree detection Multi-level CNN training and optimization • The first CNN is used for land cover classification to locate the oil palm plantation area, including three types of samples (oil palm plantation area, other vegetation / bare land, and impervious/cloud). • The second CNN is used for object classification to identify the oil palms, including for types of samples (oil palm, background, other vegetation / bare land, and impervious/cloud). • The two CNNs are trained and optimized independently based on 17,000 training samples and 3000 validation samples. CNN-2: Object classificationCNN-1: Land cover classification
  • 33.
    Transfer Learning One SourceDomain to One Target Domain One Source Domain to Multi Target Domain
  • 34.
    Case 3: UrbanLand Use Map
  • 35.
    35 Case 3: Moreaccurate urban land use map • A building extraction method based on the U-Net semantic segmentation model. • A data fusion method combining the multispectral satellite images with the public GIS map datasets. • This work won the fifth place in the Building Extraction Track of DeepGlobe - CVPR 2018 Satellite Challenge. 256×256×32 128×128×64 64×64×128 32×32×256 32×32×256 16×16×512 64×64×128 128×128×64 256×256×32 Convolution + Batch Normalization + Activation Max-poolingUpsampling Concatenation 256×256×1 GIS Map data Satellite imagery Predicted buildings
  • 36.
    36 Case 3: Moreaccurate urban land use map Satellite images Satellite images GIS Map images Rescaled images Sliced images Rescaled images Sliced images Augmented images Semantic Segmentation Post-processing Ensembling Predicted buildings Probability maps Deep Convolutional Neural Network Deep Convolutional Neural Network Deep Convolutional Neural Network Deep Convolutional Neural Network
  • 37.
    37 Case 3: Moreaccurate urban land use map Results of the proposed method Index Vegas Paris Shanghai Khartoum TP 27526 3097 11323 3495 Precision 0.9441 0.8459 0.7470 0.6398 Recall 0.8437 0.6825 0.5396 0.4694 F1-score 0.8911 0.7555 0.6266 0.5415 Method Vegas Paris Shanghai Khartoum Baseline 0.8611 0.6774 0.5342 0.4544 Enhanced 0.8730 0.7181 0.5471 0.4935 Postprocess 0.8866 0.7383 0.5897 0.5210 Add-map 0.8911 0.7555 0.6266 0.5415 • Our method improves the F1-scores by 3%-9% compared with the baseline, depending on the situation of each city. • Combining the GIS Map datasets with satellite images improves the results for all cities, even for places with few building information on the map. F1-scores obtained after each strategy KhartoumShanghai ParisVegas • Some examples of the building extraction results.
  • 38.
    n Intelligent CityBrain: p Detection p Understanding p Modeling p Planning AI for City
  • 39.
    n Li, Weijiaand Dong, Runmin and Fu, Haohuan and Wang, Jie and Yu, Le and Gong, Peng, "Integrating Google Earth imagery with Landsat data to improve 30-m resolution land cover mapping", Remote Sensing of Environment, vol. 237, pp. 111563, 2020. n Zheng, Juepeng and Li, Weijia and Xia, Maocai and Dong, Runmin and Fu, Haohuan and Yuan, Shuai, "Large-Scale Oil Palm Tree Detection from High-Resolution Remote Sensing Images Using Faster-RCNN", Proc. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 1422-1425, 2019. n Dong, Runmin and Li, Weijia and Fu, Haohuan and Gan, Lin and Yu, Le and Zheng, Juepeng and Xia, Maocai, "Oil palm plantation mapping from high-resolution remote sensing images using deep learning", to appear in International Journal of Remote Sensing, pp. 1-25, 2019. n Xia, Maocai and Li, Weijia and Fu, Haohuan and Yu, Le and Dong, Runmin and Zheng, Juepeng, "Fast and robust detection of oil palm trees using high-resolution remote sensing images", Proc. Automatic Target Recognition XXIX, vol. 10988, pp. 109880C, 2019. n Dong, Runmin and Li, Weijia and Fu, Haohuan and Xia, Maocai and Zheng, Juepeng and Yu, Le, "Semantic segmentation based large-scale oil palm plantation detection using high-resolution satellite images", Proc. Automatic Target Recognition XXIX, vol. 10988, pp. 109880D, 2019. n Li, Weijia and He, Conghui and Fu, Haohuan and Zheng, Juepeng and Dong, Runmin and Xia, Maocai and Yu, Le and Luk, Wayne, "A Real- Time Tree Crown Detection Approach for Large-Scale Remote Sensing Images on FPGAs", Remote Sensing, vol. 11, no. 9, pp. 1025, 2019. n Gong, Peng and Liu, Han and Zhang, Meinan and Li, Congcong and Wang, Jie and Huang, Huabing and Clinton, Nicholas and Ji, Luyan and Li, Wenyu and Bai, Yuqi and others, "Stable classification with limited sample: transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017", Science Bulletin, vol. 64, no. 6, pp. 370-373, 2019. n Li, Weijia and He, Conghui and Fang, Jiarui and Zheng, Juepeng and Fu, Haohuan and Yu, Le, "Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data", Remote Sensing, vol. 11, no. 4, pp. 403, 2019. n Li, Weijia and Dong, Runmin and Fu, Haohuan and Yu, Le, “Large-scale oil palm tree detection from high-resolution satellite images using two-stage convolutional neural networks”, Remote Sensing, vol. 11, no. 1, pp. 11, 2019.