This research paper is an analysis of machine learning and the study of the correlation between SDGs indicators and ranking of COVID-19 global pandemic control. SDGs (Sustainable Development Goals) consists of numerous aspects including social, economic, and environment aspects. The main purpose of this paper is to study and identify the key SDGs indicators which enable excellent control of COVID-19 global pandemic. The obtained conclusion are expected to be used as the key criteria for government and private sector investments, sustainable finance (green finance), and other investments regarding environment, society, as well as good governance (ESG).
During the data exploration process, researcher discovered a couple of limitations in SDGs dataset. A huge amount of null values and independent variables were found in the dataset. However, accurate results were acquired from Tree-Based Ensemble Model, namely Random Forest and XGBoost, due to its ability to handle null values and multicollinearity.
This paper illustrates the top six key indicators which are expected to enable excellent control of COVID-19 global pandemic. Researcher found that the significant indicators lie within these sectors; health risks, technology infrastructure, transportation infrastructure, density and urbanization, and access urban.
Ranking of SDGs Indicators on Global Pandemic Control for Green Investment
1. Ranking of SDGs Indicators
on Global Pandemic Control
for Green Investment
Independent Study Proposal
Advisor : Worapol Pongpech PH.D.
Norrawit Towanabut 6110422044
9. ทบทวนวรรณกรรม
Technical Model
1. A machine learning approach to select features important to stroke prognosis (2020), Shapiro-Wilkalgorithm, RandomForestClassifierto
find feature importance.
2. Technical information on how feature importance is calculated in boosted decision trees, “RelativeImportanceof Predictor Variables”of
the book The Elements of StatisticalLearning: Data Mining, Inference,and Prediction,page 367.
3. Gene selectionand classificationof microarray data using random forest (2020), mentionsthe advantage of comparing random forest model
with other models in the classificationprocess, the benefits from the acquired feature importancefrom the model. These processes led to the
conclusion to whichindicators are significant,while other researches focus on establishingmodelswith accuracy.
4. Computing Random Forests Variable Importance Measures(VIM) on Mixed Continuous and Categorical Data(2016), Finding feature
importanceof money laundering in supervised learning whichhave category and numeric features.Thepaper furthermorereportsthat the random
forest variableimportancemeasure is capable to accuratelyidentify almost all 21 informativevariablesin the 40-dimensional.
5. Learning to rank products based on online product reviews using a hierarchical deep neural network(2019), CNNand their deep learning
model.
6. Predicting rank for scientificresearch papers using supervised learning (2019), Comparisonboth Supervised approach , unsupervised
approach and also Semi-supervised
9
10. ทบทวนวรรณกรรม
1. Covid-19
1. Global COVID-19Index (GCI), designed to pull and analyse data from verifiedsources for 184 countries all into a single source. This makes it a
truly comprehensive index on the pandemic at hand. It consists of GCISeverity Index, GCI Recovery Index
2. Data analysis of coronavirus COVID‐19 epidemic in South Korea based on recovered and death cases (2020), Journal of Medical
Virology, study of correlationbetween various factor and COVID-19cases. The result is in line with the conclusion which proved that gender and
smoking are significantfactors in number of cases.
10
11. ทบทวนวรรณกรรม
Sustainable DevelopmentGoals
1. How COVID-19Redefines the Concept of Sustainability,The three goals—economic development, social development and environmental
protections.The significanceof adding human health as one of the sustainabilitydevelopment goals can be seen through the resultsof the current
COVID-19pandemic.
2. The Impact of COVID-19Pandemic on the United NationsSustainable Development Goals (SDGs), The successful implementationof
SDGs could have prevented this Pandemic as SDG 15 calls for eliminationof wildlifetrade. we also need to realign the SDGs to consider the
pandemic risks.
3. Measuring sustainable development, its antecedents, barriers and consequences in agriculture: An exploratory factor analysis, This
ExploratoryStudy aims to understand Brazilianfamilyfarmers’perception of sustainabledevelopment in agriculture.
4. Covid-19 and Optimal Portfolio Selection for Investment in Sustainable Development Goals, This research was a theoreticalstudy with
practicalpolicy recommendations.For future studies,it is recommended to use data from the real worldfor empiricallytestingthe model
developed in thisstudy.
5. Influencing subjective well-beingfor business and sustainable development using big data and predictive regression analysis, Influencing
subjective well-beingfor business and sustainable development using big data and predictive regressionanalysis focus on UK
6. Wastewater surveillance for Covid-19: An African perspective, Wastewatersurveillancecould play a key role in management of the COVID-
19 pandemic. 11
14. ระเบียบวิธีวิจัย
ModelingData Preparation
Data
Exploration
Data Understanding
Evaluation and
Interpretation
เลือกใช้โมเดลที่มีควำมสำมำรถ
ในกำรรับมือกับข้อมูลที่มีจำนวน
น้อยและมีข้อมูลที่สำมำรถระบุ
ได้เยอะ
ทำควำมสะอำดข้อมูล และ
จัดเตรียมรูปแบบของข้อมูลให้
พร้อมนำไปใช้ในกำรทำโมเดล
ทำกำรสำรวจด้วยค่ำทำงสถิติ เพื่อ
เข้ำใจในข้อมูลและนำไปสู่กำร
จัดเตรียมข้อมูลในขั้นต่อไป
ทำควำมเข้ำใจปัจจัยและวิธีกำร
เก็บปัจจัยของ SDGs
วัดผลและสรุปผลกำรทดลอง
13
15. ระเบียบวิธีวิจัย
Data profiling
(statistic profile)
Check data
availability
Remove indicator:
low data availability
Correlation
Analysis
Standardize
Grouping and
removing
Model Grid search
Evaluation
Feature
importance
Interpret
ModelingData Preparation
Data
Exploration
Data Understanding
Evaluation and
Interpretation
14
20. กำรเลือกประเภทโมเดล
Data Problem
-มีตัวแปรอิสระเยอะ
-มีค่ำ null value เยอะ เนื่องจำกปัญหำทำงด้ำนเก็บข้อมูลของทำง UN
-มีตัวแปรอิสระหลำยตัวแปรอิสระที่มีควำมสัมพันธ์กันสูง และมีควำมเกี่ยวข้องกัน
Tree-based Ensemble model
-สำมำรถรับมือกับค่ำ null value ได้ดีเนื่องจำก
-มีปัญหำด้ำน multicollinearity น้อยกว่ำ
Other models
-ไม่สำมำรถรับมือกับค่ำ null value จำนวนเยอะได้ทำให้ต้องตัดตัวแปรอิสระส่วนใหญ่ทิ้ง
-หำกมีปัญหำทำงด้ำน multicollinearity ซึ่งส่งผลต่อกำรทำนำยของโมเดล กำรเลือกตัวแปรอิสระเพื่อกำรทำนำยเป็นเรื่องยำกเพรำะมีตัวแปร
อิสระเยอะ
19
26. สรุปผลและอภิปรำย
Infrastructure: Technology
Patent applications, nonresidents
Infrastructure: Transportation
Air transport, freight (million ton-km)
1
0.94
0.73
Environment: Density & urbanization
Urban population growth (annual %)
0.66
Financial Sector: Access Urban
Commercial bank branches (per 100,000
adults)
0.61
Health: Risk factors
Smoking prevalence, females (% of adults)
Social Protection & Labor: Unemployment
Unemployment, total (% of total labor
force) (modeled ILO estimate)
0.61
Economic Policy & Debt: National accounts: US$ at
constant 2010 prices: Aggregate indicators
GDP
0.61
• Economic Policy & Debt: Balance of payments: Capital & financial
account:Foreign direct investment, net inflows (BoP, current US$)
• Economic Policy & Debt: National accounts: US$ at constant 2010
prices: Aggregate indicators:GNI (constant 2010 US$)
• Economic Policy & Debt: National accounts: US$ at current prices:
Aggregate indicators:GDP (current US$)
• Economic Policy & Debt: National accounts: US$ at current prices:
Value added:Manufacturing, value added (current US$)
Health: Disease prevention:
• People using at least basic drinking water services (% of
population)
• People using at least basic drinking water services, rural (% of
rural population)
• People using at least basic sanitation services (% of population)
• People using at least basic sanitation services, rural (% of rural
population)
• People using at least basic sanitation services, urban (% of urban
population)
• Social Protection & Labor: Economic activity:
• Wage and salaried workers, female (% of female employment)
(modeled ILO estimate)
24
29. References
• Fang, G., Liu, W., & Wang, L. (2020). A Machine Learning Approach to Select Features Important to Stroke Prognosis. Computational
Biology and Chemistry, 107316. Available at: https://www.sciencedirect.com/science/article/abs/pii/S1476927120306307
• Technical information on how feature importance is calculated in boosted decision trees “Relative Importance of Predictor Variables” of
the book The Elements of Statistical Learning: Data Mining, Inference, and Prediction, page 367.
• Ho-Chang Lee, Hae-Chang Rim, Do-Gil Lee (2019). Learning to Rank Products Based on Online Product Reviews Using a Hierarchical
Deep Neural Network. Available at:
https://www.researchgate.net/publication/334185492_Learning_to_Rank_Products_Based_on_Online_Product_Reviews_Using_a_Hierar
chical_Deep_Neural_Network (Accessed: XX month 202X).
• El Mohadab, M., Bouikhalene, B., & Safi, S. (2018). Predicting rank for scientific research papers using supervised learning. Applied
Computing and Informatics. Available at: https://www.sciencedirect.com/science/article/pii/S2210832717302703
• Coronavirus Disease (COVID-19) Dashboard | The Global COVID-19 Index (GCI). 2021. Coronavirus Disease (COVID-19) Dashboard |
The Global COVID-19 Index (GCI). [ONLINE] Available at: https://covid19.pemandu.org/.
• Nadia AL-Rousan, Hazem AL-Najjar (2020). Data analysis of coronavirus COVID-19 epidemic in South Korea based on recovered and
death cases. Journal of Medical Virology. Volume92, Issue9Special Issue on New coronavirus (2019-nCoV or SARS-CoV-2) and the
outbreak of the respiratory illness (COVID-19): Part-V, Pages 1603-1608. Available at:
https://onlinelibrary.wiley.com/doi/full/10.1002/jmv.25850
30. References
• Marko Hakovirta, Navodya Denuwara (2020). How COVID-19 Redefines the Concept of Sustainability. c
https://www.researchgate.net/publication/341155767_How_COVID-19_Redefines_the_Concept_of_Sustainability
• Osman Gulseven, Ibrahim Alshomali, Fatima Alharmoodi, Majid Alfalasi (2020). The Impact of COVID-19 Pandemic on the United
Nations Sustainable Development Goals (SDGs). Available
at:https://www.researchgate.net/publication/341099486_The_Impact_of_COVID-
19_Pandemic_on_the_United_Nations_Sustainable_Development_Goals_SDGs
• Laurett, R., Paço, A., & Mainardes, E. W. (2020). Measuring sustainable development, its antecedents, barriers and consequences in
agriculture: An exploratory factor analysis. Environmental Development, 100583. Available at:
https://www.sciencedirect.com/science/article/abs/pii/S2211464520301056
• Yoshino, N., Taghizadeh-Hesary, F., & Otsuka, M. (2020). Covid-19 and Optimal Portfolio Selection for Investment in Sustainable
Development Goals. Finance Research Letters, 101695. Available at:
https://www.sciencedirect.com/science/article/pii/S1544612320300854
• Weerakkody, V., Sivarajah, U., Mahroof, K., Maruyama, T., & Lu, S. (2020). Influencing subjective well-being for business and
sustainable development using big data and predictive regression analysis. Journal of Business Research. Available at:
https://www.sciencedirect.com/science/article/pii/S0148296320304860
31. References
• Street, R., Malema, S., Mahlangeni, N., & Mathee, A. (2020). COVID-19 wastewater surveillance: An African perspective. Science of The
Total Environment, 140719. Available at: https://www.sciencedirect.com/science/article/abs/pii/S0048969720342418
• Ramon Diaz-Uriarte, Sara Alvarez de Andrés (2020). Gene Selection and Classification of Microarray Data Using Random Forest.
Available at:
https://www.researchgate.net/publication/7372706_Gene_Selection_and_Classification_of_Microarray_Data_Using_Random_Forest
• ADAM HJERPE (2016). Computing Random Forests Variable Importance Measures(VIM) on Mixed Continuous and Categorical Data.
Available at: http://www.diva-portal.org/smash/get/diva2:921542/FULLTEXT01.pdf
• Robin Naidoo,Brendan Fisher (2020).Sustainable Development Goals: pandemic reset. Nature 583, 198–201 (2020) and Available at:
https://doi.org/10.1038/d41586-020-01999-x
• Respiratory Risk Factors and COVID-19 (2020). American Nonsmokers’ Right Foundation. Available at: https://no-smoke.org/respiratory-
risk-factors-covid-19
Ranking การฟื้นฟูหรือการรับมือได้ดีต่อสถานการณ์ covid-19 ในแต่ละประเทศy
Data analyse in covid-19 south korea หา correlation และหาปัจจัยสำคัญต่อการติด covid-19 ในเกาหลี
ส่วนใหญ่พูดถึง impact
หรือผลกระทบของ covid-19 ในมุมของเป้าหมายการพัฒนา ว่ากระทบเป้าหมายยังไง
และก็ศึกษาใน specific area หรือ specific point
ประโยชน์
Data Exploration
-%Data available
-Correlation btw indicators
-Group indicator which similar topic by inspecting correlation thredshold 0.8-0.9
Data Preparation
Clean null indicator
Formatting data
Modeling
Randomforest
XgBoost
Catboost
ค่า x
x
https://covid19.pemandu.org
ค่า Y ผมคือ score
Score = total rank – rank -> 185 – rank + 1