SlideShare a Scribd company logo
1 of 32
SDOBenchmark
HTTP://I4DS.GITHUB.IO/SDOBENCHMARK
HTTPS://WWW.KAGGLE.COM/FHNW-I4DS/SDOBENCHMARK
At its core, SDOBenchmark is an image dataset
tailored towards Machine Learning
AIA 171 AIA 1700 HMI magnetogram
Can we predict that a large X9 flare happens in 24h?
At its core, SDOBenchmark is an image dataset
tailored towards Machine Learning
Up to 40 images…
… for a single
prediction
At its core, SDOBenchmark is an image dataset
tailored towards Machine Learning
12h 5h 1.5h 10min 24h prediction
At its core, SDOBenchmark is an image dataset
tailored towards Machine Learning
AIA
HMI
What makes this dataset special?
1. Built for Machine Learners without a solar physics background
2. High accessibility and high scientific quality
3. Specifically engineered to avoid common overfitting issues
4. Public, open source: Website, Kaggle
Intermediate results
https://i4ds.github.io/SDOBenchmark/#current-state
15 June 2018
«First competitive model» (TSS 0.34)
Would the model have predicted the large flare of September 2017?
Peak class Predicted
X2 M5
X9 M5
Result: No. It would have predicted M instead of a strong X.
Intermediate results
Featured Dataset
15 June 2018
Dataset creation
PIPELINE FROM RAW DATA TO THE IMAGE DATASET
Pipeline overview
Raw data
download
Sample
selection
Sample
creation
Raw data
3 different sources of raw data:
• SWPC and SSW Latest Events from HEK
-> sample selection
• GOES profiles (X-ray measurements)
-> prediction label (peak_flux), sample selection
• AIA and HMI FITS
-> images
Raw data
download
Sample
selection
Sample
creation
Raw data: HEK events
Events SWPC and SSW latest events
SWPC = manual flare labels
SSW = backup for missing or inconsistent SWPC events
Raw data
download
Sample
selection
Sample
creation
Download
[{"hpc_bbox": "POLYGON((-84.402 202.566,84.402 202.566,84.3972 202.911,-84.3972 202.911,-84.402 202.566))",
"hpc_coord": "POINT(0 202.9314)", "event_starttime": "2012-01-01T00:00:00", "event_type": "AR", "intensmin":
null, "obs_meanwavel": 5e-05, "intensmax": null, "intensmedian": null, "obs_channelid": "visible",
"ar_noaaclass": "", "frm_name": "NOAA SWPC Observer", "obs_observatory": "various", "frm_daterun": "2012-04-
12T23:33:57", "hpc_y": 202.9314, "hpc_x": 0, "kb_archivdate": "2012-04-13T00:49:37", "ar_noaanum": 11390, …
Raw data: GOES profile
X-ray curves
GOES15 satellite
Raw data
download
Sample
selection
Sample
creation
Download
TIME_TAG, A_FLUX
2012-02-02 00:00:51.577, 1.0000E-09
2012-02-02 00:00:53.623, 1.0000E-09
2012-02-02 00:00:55.670, 1.0000E-09
2012-02-02 00:00:57.720, 1.0000E-09
2012-02-02 00:00:59.767, 1.0000E-09
2012-02-02 00:01:01.817, 1.0000E-09
Raw data: FITS files
Not yet:
Once we have the samples, we can download the FITS raw data.
Raw data
download
Sample
selection
Sample
creation
Sample selection: Definition
4 time steps -> 24h prediction period
12h 5h 1.5h 10min peak flux?
Constraints
• No overlaps
• Avoid Active Region overfitting
Raw data
download
Sample
selection
Sample
creation
Sample selection: Event Processing
Raw data
download
Sample
selection
Sample
creation
01.09. 02.09. 03.09. 04.09. 05.09. 06.09. 07.09. 08.09. 09.09. ...10.09. 11.09. 12.09. 13.09.
Each Active Region is split into ranges.
Range:
• start time
• end time
• largest flare (if any)
X8 (40h)
X1 (48h)
X2 (48h)
X9 (48h)
M1-9
Sample selection: Ranges -> Samples
Raw data
download
Sample
selection
Sample
creation
01.09. 02.09. 03.09. 04.09. 05.09. 06.09. 07.09. 08.09. 09.09. ...10.09. 11.09. 12.09. 13.09.
From each range 1 or 2 samples are created
Sample:
• start time
• end time
• largest flare in 24h (if any)
X8 (40h)
X2 (24h)
X9 (24h)
X1 (24h)
= sample input
= prediction period
X9 (24h)
X1 (24h)
Sample selection: Test/Training
Active Regions are split into a test and a training set
Lastly, samples go through plent of verification and validation.
Raw data
download
Sample
selection
Sample
creation
Output creation
Finally, the actual samples are created in three steps:
1. Request FITS data urls
2. Download FITS raw files
3. Process FITS files to create output images
Raw data
download
Sample
selection
Sample
creation
Output creation: FITS download
FITS = Image raw data
AIA and HMI FITS files
from JSOC with their Python REST client «Drms»
Download
query = f"hmi.Ic_45s[{qt:%Y.%m.%d_%H:%M:%S_TAI}]{magnetogram}“
client.export(query, method="url_quick", protocol="as-is")
Raw data
download
Sample
selection
Sample
creation
Output creation: Processing
For each sample, time step, wavelength:
1. Load the FITS file with Sunpy
2. Run aiaprep / hmiprep:
Rotates, scales and translates the image
3. Find the Active Region center
4. Crop out a square around it
5. Replace NaNs with 0
6. Clip and rescale image values to predefined ranges
(similar to helioviewer.org ranges)
7. Flag images whose FITS raw files are flagged (elipses, maintenance, etc.)
8. Save resulting JPEG in the sample folder
(8-bit, 256px from a 512 cropout)
Raw data
download
Sample
selection
Sample
creation
FITS
Image
Output creation: Processing
For each sample, time step, wavelength:
1. Load the FITS file with Sunpy
Raw data
download
Sample
selection
Sample
creation
FITS
Image
current_map = sunpy.map.Map(fits_file)
Output creation: Processing
For each sample, time step, wavelength:
2. Run aiaprep / hmiprep:
Rotates, scales and translates the image
Raw data
download
Sample
selection
Sample
creation
FITS
Image
if isinstance(current_map, sunpy.map.sources.AIAMap):
current_map = sunpy.instr.aia.aiaprep(current_map)
else:
hmi_scale_factor = current_map.scale.axis1 / (0.6 * u.arcsec)
current_map = current_map.rotate(recenter=True,
scale=hmi_scale_factor.value, missing=0.0)
Output creation: Processing
For each sample, time step, wavelength:
3. Find the Active Region center
Raw data
download
Sample
selection
Sample
creation
FITS
Imageregion_position_rotated = sunpy.physics.differential_rotation.solar_rotate_coordinate(
active_region_position,
observation_date
)
region_position = astropy.coordinates.SkyCoord(
float(closest_region_event["hpc_x"]) * u.arcsec,
float(closest_region_event["hpc_y"]) * u.arcsec,
frame="helioprojective",
obstime=closest_region_event["starttime"]
)
center_x, center_y = current_map.world_to_pixel(region_position_rotated)
a) Closest AR event HEK
b) Solar rotation  sample date
c) Hpc  pixel coordinates
Output creation: Processing
For each sample, time step, wavelength:
4. Crop out a square around the center
Raw data
download
Sample
selection
Sample
creation
FITS
Image
(3144, 1536)
Output creation: Processing
For each sample, time step, wavelength:
5. Replace NaNs with 0
6. Clip and rescale image values to predefined ranges
(similar to helioviewer.org ranges)
Raw data
download
Sample
selection
Sample
creation
FITS
Image
…
},
"171": {
'dataMin': 5,
'dataMax': 3500,
'dataScalingType': 3 # 0 - linear, 1 - sqrt, 3 - log10
},
"193": {
…
Output creation: Processing
For each sample, time step, wavelength:
7. If FITS file is flagged (eclipses, maintenance, etc.):
Add “flagged” to image meta data
8. Save resulting JPEG in the sample folder
(8-bit, 256px from a 512px crop out)
Raw data
download
Sample
selection
Sample
creation
FITS
Image
Output creation: Result
Raw data
download
Sample
selection
Sample
creation
Output creation: Result
Raw data
download
Sample
selection
Sample
creation
40 images
(4 time steps, 10 channels)
2012_01_01_19_06_00_0
RangeStart_SampleNr
Noaa_num
Want to know more?
• SDOBenchmark website at i4ds.github.io/SDOBenchmark
• GitHub repository at https://github.com/i4Ds/SDOBenchmark,
with publicly available source code and more information about the dataset creation process at
https://github.com/i4Ds/SDOBenchmark/blob/master/STRUCTURE.md
• Kaggle SDOBenchmark dataset at https://www.kaggle.com/fhnw-i4ds/sdobenchmark

More Related Content

Similar to SDOBenchmark - a machine learning image dataset for the prediction of solar flares

apidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIsapidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIsapidays
 
D52 p1c 1-sebs_ltc2013
D52 p1c 1-sebs_ltc2013D52 p1c 1-sebs_ltc2013
D52 p1c 1-sebs_ltc2013Ber Col P
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersDatabricks
 
Is your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposiumIs your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposiumDoug Sillars
 
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...Arif A.
 
Docker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge ComputingDocker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge ComputingBukhary Ikhwan Ismail
 
Real-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormReal-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormDataWorks Summit
 
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...Amazon Web Services
 
Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'IBM Sverige
 
Application Delivery Platform Towards Edge Computing - Bukhary Ikhwan
Application Delivery Platform Towards Edge Computing - Bukhary IkhwanApplication Delivery Platform Towards Edge Computing - Bukhary Ikhwan
Application Delivery Platform Towards Edge Computing - Bukhary IkhwanOpenNebula Project
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentationDr. Mirko Kämpf
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentationJan Kunigk
 
PyData NYC by Akira Shibata
PyData NYC by Akira ShibataPyData NYC by Akira Shibata
PyData NYC by Akira ShibataAkira Shibata
 
仕事ではじめる機械学習
仕事ではじめる機械学習仕事ではじめる機械学習
仕事ではじめる機械学習Aki Ariga
 
Master's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy ApproachMaster's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy Approachkabute
 
VM: Image analysis and fragmentation
VM: Image analysis and fragmentation VM: Image analysis and fragmentation
VM: Image analysis and fragmentation EOSC-hub project
 
How to empower community by using GIS lecture 2
How to empower community by using GIS lecture 2How to empower community by using GIS lecture 2
How to empower community by using GIS lecture 2wang yaohui
 

Similar to SDOBenchmark - a machine learning image dataset for the prediction of solar flares (20)

apidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIsapidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIs
 
N1802038292
N1802038292N1802038292
N1802038292
 
D52 p1c 1-sebs_ltc2013
D52 p1c 1-sebs_ltc2013D52 p1c 1-sebs_ltc2013
D52 p1c 1-sebs_ltc2013
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
 
Is your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposiumIs your mobile app up to speed softwaresymposium
Is your mobile app up to speed softwaresymposium
 
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
 
How to Make Hand Detector on Native Activity with OpenCV
How to Make Hand Detector on Native Activity with OpenCVHow to Make Hand Detector on Native Activity with OpenCV
How to Make Hand Detector on Native Activity with OpenCV
 
Docker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge ComputingDocker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge Computing
 
Real-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormReal-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with Storm
 
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
(BAC307) The Cold Data Playbook: Building the Ultimate Archive Solution in Am...
 
Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'
 
Application Delivery Platform Towards Edge Computing - Bukhary Ikhwan
Application Delivery Platform Towards Edge Computing - Bukhary IkhwanApplication Delivery Platform Towards Edge Computing - Bukhary Ikhwan
Application Delivery Platform Towards Edge Computing - Bukhary Ikhwan
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
Improving computer vision models at scale presentation
Improving computer vision models at scale presentationImproving computer vision models at scale presentation
Improving computer vision models at scale presentation
 
PyData NYC by Akira Shibata
PyData NYC by Akira ShibataPyData NYC by Akira Shibata
PyData NYC by Akira Shibata
 
仕事ではじめる機械学習
仕事ではじめる機械学習仕事ではじめる機械学習
仕事ではじめる機械学習
 
Master's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy ApproachMaster's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy Approach
 
VM: Image analysis and fragmentation
VM: Image analysis and fragmentation VM: Image analysis and fragmentation
VM: Image analysis and fragmentation
 
How to empower community by using GIS lecture 2
How to empower community by using GIS lecture 2How to empower community by using GIS lecture 2
How to empower community by using GIS lecture 2
 

Recently uploaded

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 

Recently uploaded (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 

SDOBenchmark - a machine learning image dataset for the prediction of solar flares

  • 2.
  • 3. At its core, SDOBenchmark is an image dataset tailored towards Machine Learning AIA 171 AIA 1700 HMI magnetogram Can we predict that a large X9 flare happens in 24h?
  • 4. At its core, SDOBenchmark is an image dataset tailored towards Machine Learning Up to 40 images… … for a single prediction
  • 5. At its core, SDOBenchmark is an image dataset tailored towards Machine Learning 12h 5h 1.5h 10min 24h prediction
  • 6. At its core, SDOBenchmark is an image dataset tailored towards Machine Learning AIA HMI
  • 7. What makes this dataset special? 1. Built for Machine Learners without a solar physics background 2. High accessibility and high scientific quality 3. Specifically engineered to avoid common overfitting issues 4. Public, open source: Website, Kaggle
  • 9. «First competitive model» (TSS 0.34) Would the model have predicted the large flare of September 2017? Peak class Predicted X2 M5 X9 M5 Result: No. It would have predicted M instead of a strong X.
  • 11. Dataset creation PIPELINE FROM RAW DATA TO THE IMAGE DATASET
  • 13. Raw data 3 different sources of raw data: • SWPC and SSW Latest Events from HEK -> sample selection • GOES profiles (X-ray measurements) -> prediction label (peak_flux), sample selection • AIA and HMI FITS -> images Raw data download Sample selection Sample creation
  • 14. Raw data: HEK events Events SWPC and SSW latest events SWPC = manual flare labels SSW = backup for missing or inconsistent SWPC events Raw data download Sample selection Sample creation Download [{"hpc_bbox": "POLYGON((-84.402 202.566,84.402 202.566,84.3972 202.911,-84.3972 202.911,-84.402 202.566))", "hpc_coord": "POINT(0 202.9314)", "event_starttime": "2012-01-01T00:00:00", "event_type": "AR", "intensmin": null, "obs_meanwavel": 5e-05, "intensmax": null, "intensmedian": null, "obs_channelid": "visible", "ar_noaaclass": "", "frm_name": "NOAA SWPC Observer", "obs_observatory": "various", "frm_daterun": "2012-04- 12T23:33:57", "hpc_y": 202.9314, "hpc_x": 0, "kb_archivdate": "2012-04-13T00:49:37", "ar_noaanum": 11390, …
  • 15. Raw data: GOES profile X-ray curves GOES15 satellite Raw data download Sample selection Sample creation Download TIME_TAG, A_FLUX 2012-02-02 00:00:51.577, 1.0000E-09 2012-02-02 00:00:53.623, 1.0000E-09 2012-02-02 00:00:55.670, 1.0000E-09 2012-02-02 00:00:57.720, 1.0000E-09 2012-02-02 00:00:59.767, 1.0000E-09 2012-02-02 00:01:01.817, 1.0000E-09
  • 16. Raw data: FITS files Not yet: Once we have the samples, we can download the FITS raw data. Raw data download Sample selection Sample creation
  • 17. Sample selection: Definition 4 time steps -> 24h prediction period 12h 5h 1.5h 10min peak flux? Constraints • No overlaps • Avoid Active Region overfitting Raw data download Sample selection Sample creation
  • 18. Sample selection: Event Processing Raw data download Sample selection Sample creation 01.09. 02.09. 03.09. 04.09. 05.09. 06.09. 07.09. 08.09. 09.09. ...10.09. 11.09. 12.09. 13.09. Each Active Region is split into ranges. Range: • start time • end time • largest flare (if any) X8 (40h) X1 (48h) X2 (48h) X9 (48h) M1-9
  • 19. Sample selection: Ranges -> Samples Raw data download Sample selection Sample creation 01.09. 02.09. 03.09. 04.09. 05.09. 06.09. 07.09. 08.09. 09.09. ...10.09. 11.09. 12.09. 13.09. From each range 1 or 2 samples are created Sample: • start time • end time • largest flare in 24h (if any) X8 (40h) X2 (24h) X9 (24h) X1 (24h) = sample input = prediction period X9 (24h) X1 (24h)
  • 20. Sample selection: Test/Training Active Regions are split into a test and a training set Lastly, samples go through plent of verification and validation. Raw data download Sample selection Sample creation
  • 21. Output creation Finally, the actual samples are created in three steps: 1. Request FITS data urls 2. Download FITS raw files 3. Process FITS files to create output images Raw data download Sample selection Sample creation
  • 22. Output creation: FITS download FITS = Image raw data AIA and HMI FITS files from JSOC with their Python REST client «Drms» Download query = f"hmi.Ic_45s[{qt:%Y.%m.%d_%H:%M:%S_TAI}]{magnetogram}“ client.export(query, method="url_quick", protocol="as-is") Raw data download Sample selection Sample creation
  • 23. Output creation: Processing For each sample, time step, wavelength: 1. Load the FITS file with Sunpy 2. Run aiaprep / hmiprep: Rotates, scales and translates the image 3. Find the Active Region center 4. Crop out a square around it 5. Replace NaNs with 0 6. Clip and rescale image values to predefined ranges (similar to helioviewer.org ranges) 7. Flag images whose FITS raw files are flagged (elipses, maintenance, etc.) 8. Save resulting JPEG in the sample folder (8-bit, 256px from a 512 cropout) Raw data download Sample selection Sample creation FITS Image
  • 24. Output creation: Processing For each sample, time step, wavelength: 1. Load the FITS file with Sunpy Raw data download Sample selection Sample creation FITS Image current_map = sunpy.map.Map(fits_file)
  • 25. Output creation: Processing For each sample, time step, wavelength: 2. Run aiaprep / hmiprep: Rotates, scales and translates the image Raw data download Sample selection Sample creation FITS Image if isinstance(current_map, sunpy.map.sources.AIAMap): current_map = sunpy.instr.aia.aiaprep(current_map) else: hmi_scale_factor = current_map.scale.axis1 / (0.6 * u.arcsec) current_map = current_map.rotate(recenter=True, scale=hmi_scale_factor.value, missing=0.0)
  • 26. Output creation: Processing For each sample, time step, wavelength: 3. Find the Active Region center Raw data download Sample selection Sample creation FITS Imageregion_position_rotated = sunpy.physics.differential_rotation.solar_rotate_coordinate( active_region_position, observation_date ) region_position = astropy.coordinates.SkyCoord( float(closest_region_event["hpc_x"]) * u.arcsec, float(closest_region_event["hpc_y"]) * u.arcsec, frame="helioprojective", obstime=closest_region_event["starttime"] ) center_x, center_y = current_map.world_to_pixel(region_position_rotated) a) Closest AR event HEK b) Solar rotation  sample date c) Hpc  pixel coordinates
  • 27. Output creation: Processing For each sample, time step, wavelength: 4. Crop out a square around the center Raw data download Sample selection Sample creation FITS Image (3144, 1536)
  • 28. Output creation: Processing For each sample, time step, wavelength: 5. Replace NaNs with 0 6. Clip and rescale image values to predefined ranges (similar to helioviewer.org ranges) Raw data download Sample selection Sample creation FITS Image … }, "171": { 'dataMin': 5, 'dataMax': 3500, 'dataScalingType': 3 # 0 - linear, 1 - sqrt, 3 - log10 }, "193": { …
  • 29. Output creation: Processing For each sample, time step, wavelength: 7. If FITS file is flagged (eclipses, maintenance, etc.): Add “flagged” to image meta data 8. Save resulting JPEG in the sample folder (8-bit, 256px from a 512px crop out) Raw data download Sample selection Sample creation FITS Image
  • 30. Output creation: Result Raw data download Sample selection Sample creation
  • 31. Output creation: Result Raw data download Sample selection Sample creation 40 images (4 time steps, 10 channels) 2012_01_01_19_06_00_0 RangeStart_SampleNr Noaa_num
  • 32. Want to know more? • SDOBenchmark website at i4ds.github.io/SDOBenchmark • GitHub repository at https://github.com/i4Ds/SDOBenchmark, with publicly available source code and more information about the dataset creation process at https://github.com/i4Ds/SDOBenchmark/blob/master/STRUCTURE.md • Kaggle SDOBenchmark dataset at https://www.kaggle.com/fhnw-i4ds/sdobenchmark

Editor's Notes

  1. Solar flares can disrupt the power grids of a continent, shut down the GPS system or irradiate people exposed in space.
  2. The sun is constantly observed in various wavelengths. Here we see 3 example images of the sun. They show an Active Region in 3 different wavelengths, 24h before an X9 flare (06 September 2007).
  3. Solar flares can disrupt the power grids of a continent, shut down the GPS system or irradiate people exposed in space.
  4. 4 time steps: 12h, 5h, … before the prediction period. 24h prediction: «Peak X-ray flux to be expected within 24h»
  5. 10 images for each time step: 8 from AIA, 2 from HMI Two instruments on a SDO satellite (AIA 94, 131, 171, 193, 211, 304, 335, 1700. HMI continuum & magnetogram)
  6. X2 first flare, X9 second flare a day later
  7. A featured dataset at kaggle.com, amongst StackOverflow, X-ray research, Data Science for Good, etc.!
  8. HEK is the «Helio Events Knowledgebase», containing countless various solar events.
  9. For this dataset, we chose to use 4 input time steps in a 12h window for a 24h window prediction period. AR overfitting: AR shapes change only slowly in time. We try to avoid having Nets just recognize ARs.
  10. For each sample (yellow), the following prediction period (24h, gray) contains the range’s peak value
  11. Samples are split by Active Regions The X flare of September 2017 has a special rule and will always be in the test set.
  12. FITS data urls for the sample’s input duration are requested from JSOC The FITS raw files of a completed request are downloaded Downloaded FITS files are processed to create output images
  13. For each sample, we download the required FITS raw data files
  14. (The Noaa nr directory is primarily for performance issues. Otherwise some file systems have issues with ten thousands of directories on a single level.)
  15. (The Noaa nr directory is primarily for performance issues. Otherwise some file systems have issues with ten thousands of directories on a single level.)