SlideShare a Scribd company logo
1 of 24
Global Indicators of High-Growth
Economies
Predicting high GDP growth factors for national economies
MIE 465 — Analytics in Action
April 13, 2018
Oghosa Igbinakenzua
Kamil Yilanci
Minja Zhu
Chris Zhu
Department of Mechanical and Industrial Engineering
University of Toronto
Global Indicators of High-Growth Economies
Predicting high GDP growth factors for national economies
Abstract
This project aims to understand how a nation can effectively allocate resources to drive economic growth.
First we predict the key features that determine growth, then identify the countries on the verge of high
growth and what they can invest in to further drive their GDP growth. We used the full World Bank
Development indicators database, which featured 217 countries & territories’ data from 1960-2016 across
1574 indicators. A target binary variable of “high-growth” is defined as countries who sustained an annual
GDP growth above the world average for 9/10 consecutive years with an initial GDP threshold above US$10
billion. Several iterations of logistic regression were used to identify the significant features and their weights
to predict our “high-growth” binary variable. Several CART models were also created to cross validate key
features and produce a more interpretable storyline. Using logistic regression, we identified some key features
to be total fisheries production, urban population growth, and life expectancy. Fisheries was a surprising
finding at first, but it represents a country’s industrialization, utilization of natural resources, and is a proxy
for access to seagoing trade. Applying key features back to our countries and indicator data, we predicted
countries that will experience high growth between 2018-2024 to include: Brazil, Ukraine, Turkey, Panama,
and Cuba. A full map is illustrated below highlighting developed “high-income” countries, the 2006-16 high
growth countries, and our predicted 2018-24 high growth countries.
i
Contents
1 Introduction 1
2 Data 1
2.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Labelling the Data (Definition of High Growth) . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Methods and Results 2
3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 CART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Discussion 6
4.1 Total Fisheries Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.2 Net official development assistance and official aid received . . . . . . . . . . . . . . . . . . . 6
4.3 Urban population growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4 Life expectancy at birth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.5 Fertility Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.6 Features with negative coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.7 Insignificant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Conclusion 9
A Histogram Plot of Feature Completeness 11
B Logistic Regression Results 12
C ROC for Logistic Regression Model 13
D Global Fisheries Production 14
E Full Unbalanced CART Model 15
F Full Balanced CART Model 16
G Predictions from Logistic Regression Models 17
G.1 Predictions from Logistic Regression Model with 0.5 threshold . . . . . . . . . . . . . . . . . 17
G.2 Predictions from Logistic Regression Model with 0.3 threshold . . . . . . . . . . . . . . . . . 18
H Predictions from CART Models 19
H.1 Predictions from unbalanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
H.2 Predictions from balanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
ii
List of Figures
1 Impact of significant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Confusion Matrices for Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Confusion Matrices for CART Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Unbalanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5 Balanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6 Prediction from Unbalanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7 Histogram of feature completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8 Logistic Regression Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9 ROC for Logistic Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10 Global fisheries production [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
11 Full Unbalanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
12 Full Balanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
13 Prediction from Logistics Regression model with 0.5 threshold . . . . . . . . . . . . . . . . . . 17
14 Prediction from Logistics Regression model with 0.3 threshold . . . . . . . . . . . . . . . . . . 18
15 Prediction from Unbalanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
16 Prediction from Balanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
List of Tables
1 Data summary from World Development Indicators (World Bank) . . . . . . . . . . . . . . . 1
iii
Global Indicators of High-Growth Economies
O. Igbinakenzua, K.Yilanci, M. Zhu, C.Zhu
April 13, 2018
1 Introduction
Our goal is to determine what makes high-growth developing countries unique and how other countries could
leverage similar characteristics. The BRICS (Brazil, Russia, India, China and South Africa) countries have
been recognized for their economic size and growth in the past 40 years which has led to their new-found
economic and political influence. We must first identify a subset of countries who experienced “high growth”
and use these as our target variable in identifying the most relevant features that drive “high growth”. Then,
we can identify the next economies on the verge of experiencing this prosperity. The results will be cross-
referenced to existing economic growth groupings and can be useful for governments in validating resource
allocation decisions for driving growth.
2 Data
The target variable in our regression will be a binary indicator of “high-growth”, to be identified in section
2.2. The World Development Indicators provides 1574 diverse indicators across categories such as economic,
demographic, health, and infrastructure information across every country. This provides us with a wealth of
information to work with and a challenge to organize.
Table 1: Data summary from World Development Indicators (World Bank)
Years # Countries ”High-Growth” Total Features % Complete # Rows
1960-2016 196 24 1574 60% 11484
2.1 Data Cleaning
Due to the quantity of data across 5 decades, over 200 countries, and over 1500 indicators to work with, we
had a hard time organizing the data. A significant amount of time was dedicated to inverting and organizing
the data. However, the biggest challenge in this project was incomplete data - only 60% of the data was
complete in our dataset. We attempted to impute the missing data two ways: by computing the mean for
certain features and also computing a similarity matrix between countries. However, both methods were
better suited to filling in relatively complete data. In the end, we selected 18 relevant features from those
that are over 80% complete and then handpicked 10 more that were relevant according to the World Bank
Featured Indicators list for a total of 28 features [1]. The distribution of data completeness is in Appendix
A.
1
2.2 Labelling the Data (Definition of High Growth)
Since we are considering growth, we will be looking at data in 10-year increments (eg. the 2005 literacy rate
predicts high growth 2005-2015, and 2006 literacy rate predicts high growth 2006-2016.)
We also cycled through several iterations of defining our target variable “High Growth”. The BRICS
countries are a somewhat ambiguous grouping of large economies, while ranking high annual GDP growth
captures tiny island nations like Nauru that don’t represent economic influence. In the end, we settled on 10
year conditional growth This takes only countries with annual GDP growth greater than the world average
for 9 out of 10 consecutive years, with an annual GDP above 10B US$ (40th percentile threshold). This
allows flexibility for a temporary downturn and also removes small economies who don’t quality for regional
influence.
3 Methods and Results
The team utilized Logistic Regression and CART methods to identify significant features for High Growth
and predict the countries that will experience growth between 2018-2028. The models are created by using
pandas, numpy, scipy, and sklearn libraries in Python.
3.1 Logistic Regression
Logistic regression is used to identify significant and most impactful features for High Growth. The team
cross-validated the model by running it with 10 random test-train splits and printing the resulting confusion
matrices. Afterwards, threshold of the model is changed by utilizing ROC curve (Appendix C) to improve
true positive rate of the model. Then, the resulting model is used to predict the countries that will experience
high growth in between 2018-2024.
The first model had an accuracy of 91.675%. The model is shared in Appendix B. According to the model the
most impactful features were: Total fisheries production (metric tons); Net official development assistance
and official aid received (current US$); Urban population growth (annual %); and Life expectancy at birth,
total (years). The team utilized the formula below to approximate the impact of one unit change in the
features mentioned to the probability of being high-growth:
impact =| eunitofchange×coefficient
− 1 |
The most impactful features and their impact are visualized in Figure 1. However, the confusion matrix
showed that there was a high bias towards not predicting high-growth. While the accuracy was 91.675%,
the true positive rate was only 44%. This was due to high data percentage classified as not high-growth. In
fact, 88% of all data was classified as not high-growth. The confusion matrix is shown in Figure 2a.
To overcome this data bias issue, we have plotted the ROC curve (Appendix C) to optimize the threshold for
our prediction. The AUC of the ROC curve is 0.867. The second iteration of the model utilized a threshold
of 0.3 to increase the true positive rate from 44% to 62.7%. The new iteration of the logistic regression
model resulted in the confusion matrix shown in figure 2b.
The accuracy stayed at a similar level, while the true positive rate improved to 62.7%. There was an increase
2
Figure 1: Impact of significant features
(a) Logistic Model with 0.5 threshold (b) Logistic Model with 0.3 threshold
Figure 2: Confusion Matrices for Logistic Regression Models
of 4.5% in false positive rate as well. A visualization of a sample prediction from the logistic regression models
are available in Appendix G.
3.2 CART
CART method was chosen to further identify the features and their importance, and to cross-validate the
results obtained from the logistic regression. Furthermore, the team preferred CART method because, often,
CART method leads to more interpretable results. The team utilized two different iterations of CART
model. The first model, “unbalanced model”, had equal weights for data classified as high-growth and non-
high-growth. The second model, “balanced model,” had different weights for both classes to ensure they had
equal representation in the data.
3
The results for both models were cross-validated with 10 random test-train splits. The resulting confusion
matrices are plotted for both of the models below. Figure 3a is for the unbalanced model and figure 3b is
for the balanced model.
(a) Unbalanced CART Model (b) Balanced CART Model
Figure 3: Confusion Matrices for CART Models
While the unbalanced model had true positive rate of 70%, the balanced model had a true positive rate of
88.59%. Accordingly, the false positive rate also increased from 3.5% to 10.3%.
The first three levels of the unbalanced model is presented in Figure 4. The first three levels of the balanced
model is presented in Figure 5.
Figure 4: Unbalanced CART Model
Both models tagged similar features as important: Total fisheries production (metric tons); Fertility rate,
total (births per woman); Population, female (% of total); Rural population (% of total population) or Rural
population growth (annual %). Some of these features were also identified by the logistic regression model
as statistically significant: Total fisheries production (metric tons) and Net official development assistance
and official aid received (current US$). One of the surprising findings in both CART models was Fertility
4
Rate, which was identified as not statistically significant by the logistic regression model but a second level
result in the CART.
Figure 5: Balanced CART Model
Our predictions from the unbalanced CART models is visualized in figure 6 and animated here. Results of
the predictions from the CART models are available in Appendix H.
Figure 6: Prediction from Unbalanced CART model
5
4 Discussion
In this section we will compare the key features from the different models and evaluate which conclusions
make sense and which do not. We will also explore the limitations of the models and discuss features that
were thought of as critical that turned out to be insignificant.
4.1 Total Fisheries Production
Although this indicator was very significant in both the logistic regression and the CART trees, it was a
surprise to us at first. Out of all the indicators, we expected categories involving urbanization, trade, debt,
and even cell phone usage to be high. However, qualitatively this can make sense because commercial fishing
signifies an industrialized economy scaling up its ocean resource utilization beyond the classic fishing villages,
which is similar to an agricultural economy going towards industrialized.
A map of global fisheries production (Appendix D) shows the Southeast China Sea, Western South America,
and Scandinavia with heavy fisheries production, which relates well to their rapid development status [2].
Having a country close to the world’s oceans facilitates trade, is a natural border, and provides an abundance
of resources including fishing, oil, and alternative energy [3]. In fact, as seen in the concluding predictions,
a high number of our predicted high growth countries have long coastlines.
In figure 1, the impact of a one unit increase for fisheries is 51%, meaning if country A originally had 30%
probability of being high growth, increasing their annual fisheries production by 1M tons would push their
probability to about 45%. Looking at a country like Vietnam, with consistently growing fisheries of 6M met-
ric tonnes in 2016, a 1M ton increase would be very significant and would represent the Vietnamese fisheries
industry increasing by 17% [4]. This makes sense with the high impact shown in the logistic regression.
4.2 Net official development assistance and official aid received
This is one of the most highly anticipated features from our team, as foreign aid ties directly with stimu-
lating development and thus economic activity. It was a high impact feature in the logistic regression and
a smaller variable in the bottom portion of the CART tree. Looking at the OECD Development Assistance
Committee’s list for OECD aid recipients 2014-2017, many of the low - medium income countries qualify as
high growth in our model while the “least developed countries” do not [5]. Thus, the countries in the upper
tail of development assistance, such as Ethiopia and Pakistan is a good indicator of future high growth as
these are economies that the “west” have invested in to stimulate growth. On the other hand, the lower tail,
such as Syria are not because aid would have mostly been to resolve humanitarian, war, and health crises
[6]. Ultimately, development aid is a good binary indicator of high growth, but not a numerical indicator
because aid amounts vary based on current events and assistance objectives.
6
4.3 Urban population growth
This is another highly anticipated feature, ranking high on the logistic regression and at the 3rd level on the
CART tree. According to the World Bank’s Commission of Urbanization and Growth, the agglomeration
economy has been a major cause for growth in the last three decades, especially in the case of China’s tripling
urban population percentage [7]. Although this is probably one of the most direct drivers of economic growth
in our list, further research suggests that some types of urbanization work better than others. Turok and
McGranahan’s 2013 journal suggests that removing rural-urban movement barriers and having the right
supportive market policies are key to enabling urban economic growth [8].
4.4 Life expectancy at birth
Life expectancy is significant in both our logistic and CART models. In the CART, it sets a cap of around
69 years where anything above would be the developed world and thus no longer high growth. Since life
expectancy is such a close approximator to economic development (low indicates humanitarian crises or war,
middle indicates developing, and high indicates developed), it is a good indicator for the model to use in
filtering out the lower tail and upper tail.
4.5 Fertility Rate
Fertility rate is interesting because it is not significant in the logistic model, but it is the 2nd level feature
in the CART tree. Fertility rate is a reflection of a country’s healthcare and career opportunities. In an
agricultural system with high rates of disease and little opportunity, a family will want many kids to ensure
a few are successful. As both factors improve, children decrease in each household. Thus, it is definitely a
significant feature and a representation of a change in quality of life.
4.6 Features with negative coefficients
There were several significant indicators that returned negative coefficients, meaning the increments would
lead to a negative impact on likelihood of high growth. For example, adolescent fertility rate’s negative
logistic regression coefficient is sensible in indicating that adolescent pregnancies can mean poor sexual ed-
ucation or a low age for maturity. Starting families young can be a sign for a large rural agricultural based
population. On the other hand, the negative coefficient on arable land makes less sense, meaning the more
arable land (% of land mass), the less likelihood of high growth. Perhaps this forces the nation to industri-
alize more quickly and rely less on agriculture.
4.7 Insignificant features
There were a few indices that we hypothesized to be very significant that did not do well in our logistic
regression. Education (school enrollment %) and trade were insignificant but in literature and research
they are important. Many intergovernmental organizations such as the WEF and World Bank have run
campaigns around education and primary completion is a key component of the Sustainable Development
7
Goals [9]. A few hypotheses for the discrepancy: one of our largest challenges was missing data, and
education completion rates were especially incomplete, which could have been a culprit in why some indicators
were worse predictors. School completion does not directly relate to economic activity. In the short term,
education participation is a result of government policy and can be a completing priority for governments in
allocating budget. A case in point is Cuba, where healthcare and education are almost 100% as they are the
main focus of the Communist government but trade sanctions and collective ownership has stifled economic
opportunity in our traditional sense.
8
5 Conclusion
In summary, our project produced four models: two iterations each of logistic regression and CART tree.
The logistic regression coefficients were especially useful in prioritizing impactful indices, and lowering our
threshold to identify more positives was a necessary adjustment during tuning of the model. The inter-
pretability of the CART tree was helpful in grasping how features interacted together in the first few levels,
but as the tree got taller and variables showed up on multiple levels we lost interpretability. Sample predic-
tion maps of all four models are in Appendices G and H. Overall, the unbalanced CART tree produced the
best accuracy score and best matched economists conclusions and current economic growth groupings. Using
this tree, we predicted the high growth economies for 2018-2024 in Appendix H.1. We can see that parts of
Latin America and Southeast Asia are consistent, representing great investment opportunities. Africa also
shows up frequently but is not consistent, which is in line with the political instability in the region.
The predictions from the unbalanced CART tree show on average 31% countries per year to experience high
growth between 2018 to 2024. Comparing our predictions to known economic development groupings, our
model’s high growth prediction matched with 8/11 of the NEXT11 countries, 3/4 of the MINT countries,
and 11/15 of the EAGLES emerging growth countries [10][11][12]. Nigeria, Turkey, and Iran are the standout
countries that are consistently in these groupings but were not highlighted in our predictions.
While the models showed positive results, there are still outstanding issues and limitations with our process:
• Involving expert opinion - the lack of economic expertise means we had to rely heavily on math and
technical techniques. For example, it would have been better to begin with a stronger hypothesis and
conduct feature selection based on expertise rather than the availability of data or feature selection
algorithms.
• Better data - our complete World Bank dataset was only 60% complete which means we had to impute
certain missing data and also eliminate indicators based on incomplete census data. Supplement-
ing additional datasets and computing the similarities between countries through tensor factorization
techniques such as CANDECOM/PARFAC [13] could have produced more complete results.
Due to these limitations, the team would not be confident with having any detailed government resource
allocation decisions based on our results, however the exercise did do a good job of showing which fields were
important drivers of growth and achieved our primary goal of predicting future high growth countries.
Given a revised model with improved data and expert hypotheses, resource allocation optimization is a
logical next step. Based on a country’s growth target and its available resources, an optimization model can
be developed to effectively identify how to distribute the available capital, human and natural resources to
achieve high-growth. The team is excited about the results and learnings from this project and look forward
to future opportunities to further implement and revise on these results in the global development space.
9
References
[1] T. W. Bank. World bank indicators, [Online]. Available: https://data.worldbank.org/indicator.
[2] F. Carr´e. Ressources menac´ees de l’oc´ean mondial, [Online]. Available: https://www.monde-diplomatique.
fr/publications/l_atlas_geopolitique/a53308.
[3] T. W. Bank. Oceans, fisheries and coastal economies, [Online]. Available: http://www.worldbank.
org/en/topic/environment/brief/oceans.
[4] ——, Total fisheries production (metric tons), [Online]. Available: https://data.worldbank.org/
indicator/ER.FSH.PROD.MT.
[5] OECD. Dac list of oda recipients, [Online]. Available: http://www.oecd.org/dac/financing-
sustainable-development/development-finance-standards/DAC_List_ODA_Recipients2014to2017_
flows_En.pdf.
[6] T. W. Bank. Net official development assistance received, [Online]. Available: https://data.worldbank.
org/indicator/DT.ODA.ODAT.CD?year_high_desc=true.
[7] ——, Urbanization and growth, [Online]. Available: https : / / siteresources . worldbank . org /
EXTPREMNET/Resources/489960-1338997241035/Growth_Commission_Vol1_Urbanization_Growth.
pdf.
[8] I. Turok and G. McGranahan. Urbanization and economic growth: The arguments and evidence
for africa and asia, [Online]. Available: http : / / journals . sagepub . com / doi / full / 10 . 1177 /
0956247813490908.
[9] U. Nations. Sustainable development goal 4, [Online]. Available: https://sustainabledevelopment.
un.org/sdg4.
[10] Investopedia. Eagles, [Online]. Available: https://www.investopedia.com/terms/e/eagles.asp.
[11] BBC. The mint countries: Next economic giants?, [Online]. Available: http://www.bbc.com/news/
magazine-25548060.
[12] Goldmansachs. Beyond the brics: A look at the next 11, [Online]. Available: http://www.goldmansachs.
com/our-thinking/archive/archive-pdfs/brics-book/brics-chap-13.pdf.
[13] E. Acar, D. M. Dunlavy, and T. G. Kolda. Fitting a tensor decomposition is a nonlinear optimization
problem, [Online]. Available: http://www.cs.cornell.edu/cv/tenwork/Slides/Kolda.pdf.
10
A Histogram Plot of Feature Completeness
Figure 7: Histogram of feature completeness
11
B Logistic Regression Results
Figure 8: Logistic Regression Model Results
12
C ROC for Logistic Regression Model
Figure 9: ROC for Logistic Regression Model
13
D Global Fisheries Production
Figure 10: Global fisheries production [3]
14
E Full Unbalanced CART Model
Figure 11: Full Unbalanced CART Model
15
F Full Balanced CART Model
Figure 12: Full Balanced CART Model
16
G Predictions from Logistic Regression Models
G.1 Predictions from Logistic Regression Model with 0.5 threshold
Figure 13: Prediction from Logistics Regression model with 0.5 threshold
17
G.2 Predictions from Logistic Regression Model with 0.3 threshold
Figure 14: Prediction from Logistics Regression model with 0.3 threshold
18
H Predictions from CART Models
H.1 Predictions from unbalanced CART model
An animated version of the predictions is available here: https://goo.gl/7SbpD1
Figure 15: Prediction from Unbalanced CART model
19
H.2 Predictions from balanced CART model
Figure 16: Prediction from Balanced CART model
20

More Related Content

Similar to Global Indicators of High Growth Economies

Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013
Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013
Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013MYO AUNG Myanmar
 
Women and men stat picture
Women and men stat pictureWomen and men stat picture
Women and men stat pictureDr Lendy Spires
 
Courier Express and Parcel CEP Market Report 2022
Courier Express and Parcel CEP  Market Report 2022Courier Express and Parcel CEP  Market Report 2022
Courier Express and Parcel CEP Market Report 2022Cognitive Market Research
 
Preinforme goverment at glance 2013
Preinforme goverment at glance 2013Preinforme goverment at glance 2013
Preinforme goverment at glance 2013Observatoriodigital
 
UNDP_RW_MDGR Rwanda_31_03_2015
UNDP_RW_MDGR Rwanda_31_03_2015UNDP_RW_MDGR Rwanda_31_03_2015
UNDP_RW_MDGR Rwanda_31_03_2015Jordyn Iger
 
Weiner US Treasury ota83
Weiner US Treasury ota83Weiner US Treasury ota83
Weiner US Treasury ota83Joann Weiner
 

Similar to Global Indicators of High Growth Economies (20)

China tractor market report sample pages
China tractor market report   sample pagesChina tractor market report   sample pages
China tractor market report sample pages
 
China tractor market report sample pages
China tractor market report   sample pagesChina tractor market report   sample pages
China tractor market report sample pages
 
China biscuit market report sample pages
China biscuit market report   sample pagesChina biscuit market report   sample pages
China biscuit market report sample pages
 
EC331_a2
EC331_a2EC331_a2
EC331_a2
 
Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013
Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013
Flight-capital-and-illicit-financial-flows-to-and-from-Myanmar-1960-2013
 
China packaging equipment market report sample pages
China packaging equipment market report   sample pagesChina packaging equipment market report   sample pages
China packaging equipment market report sample pages
 
China circuit printing market report sample pages
China circuit printing market report   sample pagesChina circuit printing market report   sample pages
China circuit printing market report sample pages
 
Women and men stat picture
Women and men stat pictureWomen and men stat picture
Women and men stat picture
 
Fiscal Sustainability: Conceptual, Institutional, and Policy Issues
Fiscal Sustainability: Conceptual, Institutional, and Policy IssuesFiscal Sustainability: Conceptual, Institutional, and Policy Issues
Fiscal Sustainability: Conceptual, Institutional, and Policy Issues
 
China quick frozen foods market report sample pages
China quick frozen foods market report   sample pagesChina quick frozen foods market report   sample pages
China quick frozen foods market report sample pages
 
China locomotives trains market report sample pages
China locomotives trains market report   sample pagesChina locomotives trains market report   sample pages
China locomotives trains market report sample pages
 
Courier Express and Parcel CEP Market Report 2022
Courier Express and Parcel CEP  Market Report 2022Courier Express and Parcel CEP  Market Report 2022
Courier Express and Parcel CEP Market Report 2022
 
Preinforme goverment at glance 2013
Preinforme goverment at glance 2013Preinforme goverment at glance 2013
Preinforme goverment at glance 2013
 
UNDP_RW_MDGR Rwanda_31_03_2015
UNDP_RW_MDGR Rwanda_31_03_2015UNDP_RW_MDGR Rwanda_31_03_2015
UNDP_RW_MDGR Rwanda_31_03_2015
 
Weiner US Treasury ota83
Weiner US Treasury ota83Weiner US Treasury ota83
Weiner US Treasury ota83
 
China baked foods market report sample pages
China baked foods market report   sample pagesChina baked foods market report   sample pages
China baked foods market report sample pages
 
China tractor market profile sample pages
China tractor market profile   sample pagesChina tractor market profile   sample pages
China tractor market profile sample pages
 
China tractor market profile sample pages
China tractor market profile   sample pagesChina tractor market profile   sample pages
China tractor market profile sample pages
 
China computer market report sample pages
China computer market report   sample pagesChina computer market report   sample pages
China computer market report sample pages
 
China convenient foods market report sample pages
China convenient foods market report   sample pagesChina convenient foods market report   sample pages
China convenient foods market report sample pages
 

Recently uploaded

Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Christina Parmionova
 
Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...ResolutionFoundation
 
VIP Russian Call Girls in Indore Ishita 💚😋 9256729539 🚀 Indore Escorts
VIP Russian Call Girls in Indore Ishita 💚😋  9256729539 🚀 Indore EscortsVIP Russian Call Girls in Indore Ishita 💚😋  9256729539 🚀 Indore Escorts
VIP Russian Call Girls in Indore Ishita 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersCongressional Budget Office
 
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation -  Humble BeginningsZechariah Boodey Farmstead Collaborative presentation -  Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginningsinfo695895
 
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...Suhani Kapoor
 
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...Suhani Kapoor
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxPeter Miles
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...aartirawatdelhi
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28JSchaus & Associates
 
2024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 272024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 27JSchaus & Associates
 
CBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCongressional Budget Office
 
Climate change and safety and health at work
Climate change and safety and health at workClimate change and safety and health at work
Climate change and safety and health at workChristina Parmionova
 
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up NumberMs Riya
 
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...Hemant Purohit
 

Recently uploaded (20)

Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.
 
Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...
 
VIP Russian Call Girls in Indore Ishita 💚😋 9256729539 🚀 Indore Escorts
VIP Russian Call Girls in Indore Ishita 💚😋  9256729539 🚀 Indore EscortsVIP Russian Call Girls in Indore Ishita 💚😋  9256729539 🚀 Indore Escorts
VIP Russian Call Girls in Indore Ishita 💚😋 9256729539 🚀 Indore Escorts
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists Lawmakers
 
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation -  Humble BeginningsZechariah Boodey Farmstead Collaborative presentation -  Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
 
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
VIP High Class Call Girls Amravati Anushka 8250192130 Independent Escort Serv...
 
Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...
Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...
Russian Call Girls Service Ashiyana Colony { Lucknow Call Girls Service 95482...
 
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Bikaner Aishwarya 8250192130 Independent Escort Servic...
 
How to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the ThreatHow to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the Threat
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
 
Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In  Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCeCall Girls In  Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Rohini ꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28
 
2024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 272024: The FAR, Federal Acquisition Regulations - Part 27
2024: The FAR, Federal Acquisition Regulations - Part 27
 
CBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related TopicsCBO’s Recent Appeals for New Research on Health-Related Topics
CBO’s Recent Appeals for New Research on Health-Related Topics
 
Climate change and safety and health at work
Climate change and safety and health at workClimate change and safety and health at work
Climate change and safety and health at work
 
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas  Whats Up Number
##9711199012 Call Girls Delhi Rs-5000 UpTo 10 K Hauz Khas Whats Up Number
 
Rohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 37 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
 

Global Indicators of High Growth Economies

  • 1. Global Indicators of High-Growth Economies Predicting high GDP growth factors for national economies MIE 465 — Analytics in Action April 13, 2018 Oghosa Igbinakenzua Kamil Yilanci Minja Zhu Chris Zhu Department of Mechanical and Industrial Engineering University of Toronto
  • 2. Global Indicators of High-Growth Economies Predicting high GDP growth factors for national economies Abstract This project aims to understand how a nation can effectively allocate resources to drive economic growth. First we predict the key features that determine growth, then identify the countries on the verge of high growth and what they can invest in to further drive their GDP growth. We used the full World Bank Development indicators database, which featured 217 countries & territories’ data from 1960-2016 across 1574 indicators. A target binary variable of “high-growth” is defined as countries who sustained an annual GDP growth above the world average for 9/10 consecutive years with an initial GDP threshold above US$10 billion. Several iterations of logistic regression were used to identify the significant features and their weights to predict our “high-growth” binary variable. Several CART models were also created to cross validate key features and produce a more interpretable storyline. Using logistic regression, we identified some key features to be total fisheries production, urban population growth, and life expectancy. Fisheries was a surprising finding at first, but it represents a country’s industrialization, utilization of natural resources, and is a proxy for access to seagoing trade. Applying key features back to our countries and indicator data, we predicted countries that will experience high growth between 2018-2024 to include: Brazil, Ukraine, Turkey, Panama, and Cuba. A full map is illustrated below highlighting developed “high-income” countries, the 2006-16 high growth countries, and our predicted 2018-24 high growth countries. i
  • 3. Contents 1 Introduction 1 2 Data 1 2.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 Labelling the Data (Definition of High Growth) . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 Methods and Results 2 3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.2 CART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 Discussion 6 4.1 Total Fisheries Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2 Net official development assistance and official aid received . . . . . . . . . . . . . . . . . . . 6 4.3 Urban population growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.4 Life expectancy at birth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.5 Fertility Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.6 Features with negative coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.7 Insignificant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5 Conclusion 9 A Histogram Plot of Feature Completeness 11 B Logistic Regression Results 12 C ROC for Logistic Regression Model 13 D Global Fisheries Production 14 E Full Unbalanced CART Model 15 F Full Balanced CART Model 16 G Predictions from Logistic Regression Models 17 G.1 Predictions from Logistic Regression Model with 0.5 threshold . . . . . . . . . . . . . . . . . 17 G.2 Predictions from Logistic Regression Model with 0.3 threshold . . . . . . . . . . . . . . . . . 18 H Predictions from CART Models 19 H.1 Predictions from unbalanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 H.2 Predictions from balanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 ii
  • 4. List of Figures 1 Impact of significant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Confusion Matrices for Logistic Regression Models . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Confusion Matrices for CART Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 Unbalanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 Balanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6 Prediction from Unbalanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 7 Histogram of feature completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 8 Logistic Regression Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 9 ROC for Logistic Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 10 Global fisheries production [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 11 Full Unbalanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 12 Full Balanced CART Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 13 Prediction from Logistics Regression model with 0.5 threshold . . . . . . . . . . . . . . . . . . 17 14 Prediction from Logistics Regression model with 0.3 threshold . . . . . . . . . . . . . . . . . . 18 15 Prediction from Unbalanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 16 Prediction from Balanced CART model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 List of Tables 1 Data summary from World Development Indicators (World Bank) . . . . . . . . . . . . . . . 1 iii
  • 5. Global Indicators of High-Growth Economies O. Igbinakenzua, K.Yilanci, M. Zhu, C.Zhu April 13, 2018 1 Introduction Our goal is to determine what makes high-growth developing countries unique and how other countries could leverage similar characteristics. The BRICS (Brazil, Russia, India, China and South Africa) countries have been recognized for their economic size and growth in the past 40 years which has led to their new-found economic and political influence. We must first identify a subset of countries who experienced “high growth” and use these as our target variable in identifying the most relevant features that drive “high growth”. Then, we can identify the next economies on the verge of experiencing this prosperity. The results will be cross- referenced to existing economic growth groupings and can be useful for governments in validating resource allocation decisions for driving growth. 2 Data The target variable in our regression will be a binary indicator of “high-growth”, to be identified in section 2.2. The World Development Indicators provides 1574 diverse indicators across categories such as economic, demographic, health, and infrastructure information across every country. This provides us with a wealth of information to work with and a challenge to organize. Table 1: Data summary from World Development Indicators (World Bank) Years # Countries ”High-Growth” Total Features % Complete # Rows 1960-2016 196 24 1574 60% 11484 2.1 Data Cleaning Due to the quantity of data across 5 decades, over 200 countries, and over 1500 indicators to work with, we had a hard time organizing the data. A significant amount of time was dedicated to inverting and organizing the data. However, the biggest challenge in this project was incomplete data - only 60% of the data was complete in our dataset. We attempted to impute the missing data two ways: by computing the mean for certain features and also computing a similarity matrix between countries. However, both methods were better suited to filling in relatively complete data. In the end, we selected 18 relevant features from those that are over 80% complete and then handpicked 10 more that were relevant according to the World Bank Featured Indicators list for a total of 28 features [1]. The distribution of data completeness is in Appendix A. 1
  • 6. 2.2 Labelling the Data (Definition of High Growth) Since we are considering growth, we will be looking at data in 10-year increments (eg. the 2005 literacy rate predicts high growth 2005-2015, and 2006 literacy rate predicts high growth 2006-2016.) We also cycled through several iterations of defining our target variable “High Growth”. The BRICS countries are a somewhat ambiguous grouping of large economies, while ranking high annual GDP growth captures tiny island nations like Nauru that don’t represent economic influence. In the end, we settled on 10 year conditional growth This takes only countries with annual GDP growth greater than the world average for 9 out of 10 consecutive years, with an annual GDP above 10B US$ (40th percentile threshold). This allows flexibility for a temporary downturn and also removes small economies who don’t quality for regional influence. 3 Methods and Results The team utilized Logistic Regression and CART methods to identify significant features for High Growth and predict the countries that will experience growth between 2018-2028. The models are created by using pandas, numpy, scipy, and sklearn libraries in Python. 3.1 Logistic Regression Logistic regression is used to identify significant and most impactful features for High Growth. The team cross-validated the model by running it with 10 random test-train splits and printing the resulting confusion matrices. Afterwards, threshold of the model is changed by utilizing ROC curve (Appendix C) to improve true positive rate of the model. Then, the resulting model is used to predict the countries that will experience high growth in between 2018-2024. The first model had an accuracy of 91.675%. The model is shared in Appendix B. According to the model the most impactful features were: Total fisheries production (metric tons); Net official development assistance and official aid received (current US$); Urban population growth (annual %); and Life expectancy at birth, total (years). The team utilized the formula below to approximate the impact of one unit change in the features mentioned to the probability of being high-growth: impact =| eunitofchange×coefficient − 1 | The most impactful features and their impact are visualized in Figure 1. However, the confusion matrix showed that there was a high bias towards not predicting high-growth. While the accuracy was 91.675%, the true positive rate was only 44%. This was due to high data percentage classified as not high-growth. In fact, 88% of all data was classified as not high-growth. The confusion matrix is shown in Figure 2a. To overcome this data bias issue, we have plotted the ROC curve (Appendix C) to optimize the threshold for our prediction. The AUC of the ROC curve is 0.867. The second iteration of the model utilized a threshold of 0.3 to increase the true positive rate from 44% to 62.7%. The new iteration of the logistic regression model resulted in the confusion matrix shown in figure 2b. The accuracy stayed at a similar level, while the true positive rate improved to 62.7%. There was an increase 2
  • 7. Figure 1: Impact of significant features (a) Logistic Model with 0.5 threshold (b) Logistic Model with 0.3 threshold Figure 2: Confusion Matrices for Logistic Regression Models of 4.5% in false positive rate as well. A visualization of a sample prediction from the logistic regression models are available in Appendix G. 3.2 CART CART method was chosen to further identify the features and their importance, and to cross-validate the results obtained from the logistic regression. Furthermore, the team preferred CART method because, often, CART method leads to more interpretable results. The team utilized two different iterations of CART model. The first model, “unbalanced model”, had equal weights for data classified as high-growth and non- high-growth. The second model, “balanced model,” had different weights for both classes to ensure they had equal representation in the data. 3
  • 8. The results for both models were cross-validated with 10 random test-train splits. The resulting confusion matrices are plotted for both of the models below. Figure 3a is for the unbalanced model and figure 3b is for the balanced model. (a) Unbalanced CART Model (b) Balanced CART Model Figure 3: Confusion Matrices for CART Models While the unbalanced model had true positive rate of 70%, the balanced model had a true positive rate of 88.59%. Accordingly, the false positive rate also increased from 3.5% to 10.3%. The first three levels of the unbalanced model is presented in Figure 4. The first three levels of the balanced model is presented in Figure 5. Figure 4: Unbalanced CART Model Both models tagged similar features as important: Total fisheries production (metric tons); Fertility rate, total (births per woman); Population, female (% of total); Rural population (% of total population) or Rural population growth (annual %). Some of these features were also identified by the logistic regression model as statistically significant: Total fisheries production (metric tons) and Net official development assistance and official aid received (current US$). One of the surprising findings in both CART models was Fertility 4
  • 9. Rate, which was identified as not statistically significant by the logistic regression model but a second level result in the CART. Figure 5: Balanced CART Model Our predictions from the unbalanced CART models is visualized in figure 6 and animated here. Results of the predictions from the CART models are available in Appendix H. Figure 6: Prediction from Unbalanced CART model 5
  • 10. 4 Discussion In this section we will compare the key features from the different models and evaluate which conclusions make sense and which do not. We will also explore the limitations of the models and discuss features that were thought of as critical that turned out to be insignificant. 4.1 Total Fisheries Production Although this indicator was very significant in both the logistic regression and the CART trees, it was a surprise to us at first. Out of all the indicators, we expected categories involving urbanization, trade, debt, and even cell phone usage to be high. However, qualitatively this can make sense because commercial fishing signifies an industrialized economy scaling up its ocean resource utilization beyond the classic fishing villages, which is similar to an agricultural economy going towards industrialized. A map of global fisheries production (Appendix D) shows the Southeast China Sea, Western South America, and Scandinavia with heavy fisheries production, which relates well to their rapid development status [2]. Having a country close to the world’s oceans facilitates trade, is a natural border, and provides an abundance of resources including fishing, oil, and alternative energy [3]. In fact, as seen in the concluding predictions, a high number of our predicted high growth countries have long coastlines. In figure 1, the impact of a one unit increase for fisheries is 51%, meaning if country A originally had 30% probability of being high growth, increasing their annual fisheries production by 1M tons would push their probability to about 45%. Looking at a country like Vietnam, with consistently growing fisheries of 6M met- ric tonnes in 2016, a 1M ton increase would be very significant and would represent the Vietnamese fisheries industry increasing by 17% [4]. This makes sense with the high impact shown in the logistic regression. 4.2 Net official development assistance and official aid received This is one of the most highly anticipated features from our team, as foreign aid ties directly with stimu- lating development and thus economic activity. It was a high impact feature in the logistic regression and a smaller variable in the bottom portion of the CART tree. Looking at the OECD Development Assistance Committee’s list for OECD aid recipients 2014-2017, many of the low - medium income countries qualify as high growth in our model while the “least developed countries” do not [5]. Thus, the countries in the upper tail of development assistance, such as Ethiopia and Pakistan is a good indicator of future high growth as these are economies that the “west” have invested in to stimulate growth. On the other hand, the lower tail, such as Syria are not because aid would have mostly been to resolve humanitarian, war, and health crises [6]. Ultimately, development aid is a good binary indicator of high growth, but not a numerical indicator because aid amounts vary based on current events and assistance objectives. 6
  • 11. 4.3 Urban population growth This is another highly anticipated feature, ranking high on the logistic regression and at the 3rd level on the CART tree. According to the World Bank’s Commission of Urbanization and Growth, the agglomeration economy has been a major cause for growth in the last three decades, especially in the case of China’s tripling urban population percentage [7]. Although this is probably one of the most direct drivers of economic growth in our list, further research suggests that some types of urbanization work better than others. Turok and McGranahan’s 2013 journal suggests that removing rural-urban movement barriers and having the right supportive market policies are key to enabling urban economic growth [8]. 4.4 Life expectancy at birth Life expectancy is significant in both our logistic and CART models. In the CART, it sets a cap of around 69 years where anything above would be the developed world and thus no longer high growth. Since life expectancy is such a close approximator to economic development (low indicates humanitarian crises or war, middle indicates developing, and high indicates developed), it is a good indicator for the model to use in filtering out the lower tail and upper tail. 4.5 Fertility Rate Fertility rate is interesting because it is not significant in the logistic model, but it is the 2nd level feature in the CART tree. Fertility rate is a reflection of a country’s healthcare and career opportunities. In an agricultural system with high rates of disease and little opportunity, a family will want many kids to ensure a few are successful. As both factors improve, children decrease in each household. Thus, it is definitely a significant feature and a representation of a change in quality of life. 4.6 Features with negative coefficients There were several significant indicators that returned negative coefficients, meaning the increments would lead to a negative impact on likelihood of high growth. For example, adolescent fertility rate’s negative logistic regression coefficient is sensible in indicating that adolescent pregnancies can mean poor sexual ed- ucation or a low age for maturity. Starting families young can be a sign for a large rural agricultural based population. On the other hand, the negative coefficient on arable land makes less sense, meaning the more arable land (% of land mass), the less likelihood of high growth. Perhaps this forces the nation to industri- alize more quickly and rely less on agriculture. 4.7 Insignificant features There were a few indices that we hypothesized to be very significant that did not do well in our logistic regression. Education (school enrollment %) and trade were insignificant but in literature and research they are important. Many intergovernmental organizations such as the WEF and World Bank have run campaigns around education and primary completion is a key component of the Sustainable Development 7
  • 12. Goals [9]. A few hypotheses for the discrepancy: one of our largest challenges was missing data, and education completion rates were especially incomplete, which could have been a culprit in why some indicators were worse predictors. School completion does not directly relate to economic activity. In the short term, education participation is a result of government policy and can be a completing priority for governments in allocating budget. A case in point is Cuba, where healthcare and education are almost 100% as they are the main focus of the Communist government but trade sanctions and collective ownership has stifled economic opportunity in our traditional sense. 8
  • 13. 5 Conclusion In summary, our project produced four models: two iterations each of logistic regression and CART tree. The logistic regression coefficients were especially useful in prioritizing impactful indices, and lowering our threshold to identify more positives was a necessary adjustment during tuning of the model. The inter- pretability of the CART tree was helpful in grasping how features interacted together in the first few levels, but as the tree got taller and variables showed up on multiple levels we lost interpretability. Sample predic- tion maps of all four models are in Appendices G and H. Overall, the unbalanced CART tree produced the best accuracy score and best matched economists conclusions and current economic growth groupings. Using this tree, we predicted the high growth economies for 2018-2024 in Appendix H.1. We can see that parts of Latin America and Southeast Asia are consistent, representing great investment opportunities. Africa also shows up frequently but is not consistent, which is in line with the political instability in the region. The predictions from the unbalanced CART tree show on average 31% countries per year to experience high growth between 2018 to 2024. Comparing our predictions to known economic development groupings, our model’s high growth prediction matched with 8/11 of the NEXT11 countries, 3/4 of the MINT countries, and 11/15 of the EAGLES emerging growth countries [10][11][12]. Nigeria, Turkey, and Iran are the standout countries that are consistently in these groupings but were not highlighted in our predictions. While the models showed positive results, there are still outstanding issues and limitations with our process: • Involving expert opinion - the lack of economic expertise means we had to rely heavily on math and technical techniques. For example, it would have been better to begin with a stronger hypothesis and conduct feature selection based on expertise rather than the availability of data or feature selection algorithms. • Better data - our complete World Bank dataset was only 60% complete which means we had to impute certain missing data and also eliminate indicators based on incomplete census data. Supplement- ing additional datasets and computing the similarities between countries through tensor factorization techniques such as CANDECOM/PARFAC [13] could have produced more complete results. Due to these limitations, the team would not be confident with having any detailed government resource allocation decisions based on our results, however the exercise did do a good job of showing which fields were important drivers of growth and achieved our primary goal of predicting future high growth countries. Given a revised model with improved data and expert hypotheses, resource allocation optimization is a logical next step. Based on a country’s growth target and its available resources, an optimization model can be developed to effectively identify how to distribute the available capital, human and natural resources to achieve high-growth. The team is excited about the results and learnings from this project and look forward to future opportunities to further implement and revise on these results in the global development space. 9
  • 14. References [1] T. W. Bank. World bank indicators, [Online]. Available: https://data.worldbank.org/indicator. [2] F. Carr´e. Ressources menac´ees de l’oc´ean mondial, [Online]. Available: https://www.monde-diplomatique. fr/publications/l_atlas_geopolitique/a53308. [3] T. W. Bank. Oceans, fisheries and coastal economies, [Online]. Available: http://www.worldbank. org/en/topic/environment/brief/oceans. [4] ——, Total fisheries production (metric tons), [Online]. Available: https://data.worldbank.org/ indicator/ER.FSH.PROD.MT. [5] OECD. Dac list of oda recipients, [Online]. Available: http://www.oecd.org/dac/financing- sustainable-development/development-finance-standards/DAC_List_ODA_Recipients2014to2017_ flows_En.pdf. [6] T. W. Bank. Net official development assistance received, [Online]. Available: https://data.worldbank. org/indicator/DT.ODA.ODAT.CD?year_high_desc=true. [7] ——, Urbanization and growth, [Online]. Available: https : / / siteresources . worldbank . org / EXTPREMNET/Resources/489960-1338997241035/Growth_Commission_Vol1_Urbanization_Growth. pdf. [8] I. Turok and G. McGranahan. Urbanization and economic growth: The arguments and evidence for africa and asia, [Online]. Available: http : / / journals . sagepub . com / doi / full / 10 . 1177 / 0956247813490908. [9] U. Nations. Sustainable development goal 4, [Online]. Available: https://sustainabledevelopment. un.org/sdg4. [10] Investopedia. Eagles, [Online]. Available: https://www.investopedia.com/terms/e/eagles.asp. [11] BBC. The mint countries: Next economic giants?, [Online]. Available: http://www.bbc.com/news/ magazine-25548060. [12] Goldmansachs. Beyond the brics: A look at the next 11, [Online]. Available: http://www.goldmansachs. com/our-thinking/archive/archive-pdfs/brics-book/brics-chap-13.pdf. [13] E. Acar, D. M. Dunlavy, and T. G. Kolda. Fitting a tensor decomposition is a nonlinear optimization problem, [Online]. Available: http://www.cs.cornell.edu/cv/tenwork/Slides/Kolda.pdf. 10
  • 15. A Histogram Plot of Feature Completeness Figure 7: Histogram of feature completeness 11
  • 16. B Logistic Regression Results Figure 8: Logistic Regression Model Results 12
  • 17. C ROC for Logistic Regression Model Figure 9: ROC for Logistic Regression Model 13
  • 18. D Global Fisheries Production Figure 10: Global fisheries production [3] 14
  • 19. E Full Unbalanced CART Model Figure 11: Full Unbalanced CART Model 15
  • 20. F Full Balanced CART Model Figure 12: Full Balanced CART Model 16
  • 21. G Predictions from Logistic Regression Models G.1 Predictions from Logistic Regression Model with 0.5 threshold Figure 13: Prediction from Logistics Regression model with 0.5 threshold 17
  • 22. G.2 Predictions from Logistic Regression Model with 0.3 threshold Figure 14: Prediction from Logistics Regression model with 0.3 threshold 18
  • 23. H Predictions from CART Models H.1 Predictions from unbalanced CART model An animated version of the predictions is available here: https://goo.gl/7SbpD1 Figure 15: Prediction from Unbalanced CART model 19
  • 24. H.2 Predictions from balanced CART model Figure 16: Prediction from Balanced CART model 20