SlideShare a Scribd company logo
University Student’s mobility under the ERASMUS program in
European Union, 2008-2014.
Candidate number: 171846
Introduction _______________________________________________________________2
Part 1_____________________________________________________________________3
Part 2_____________________________________________________________________5
Conclusion_________________________________________________________________7
Appendix__________________________________________________________________8
Code ____________________________________________________________________10
Introduction
The ERASMUS program (European Community Action Scheme for the Mobility of
University Students) is a student exchange program among the universities which are located
in EU member states. It operates since 1987 and its primary aim is to promote the
collaboration as well as to lessen the cultural differences by experiencing the life abroad.
Even though that there are other programs similar to ERASMUS that operate in EU, such as
ERASMUS+, we will focus our analysis to university students regardless their level of
studies.
Our analysis consists of two parts. In the first place, we are going to aggregate the
various entries to find insights about the student’s mobility characteristics. For that purpose,
we are using the data files on EU Open Data Portal (data.europa.eu), which vary from the
second semester of 2008 until the first semester of 2014. In these files, we can find entries of
students who chose to participate on the program the above period and also, more specific
characteristics about them, such as their age, their level of studies etc. The second part
focuses on the countries and tries to investigate the possible effect on mobility by the
countries’ specific characteristics. In combination with our aggregate data from the first part,
we use data files, which were retrieved from the website of Eurostat
(ec.europa.eu/eurostat/data/database).
Part 1
 Countries.
Between the periods of our interest, approximately 2 million students decided to
participate in ERASMUS program. The top five countries that attracted the students' interest
are Spain, France, Germany, United Kingdom and Italy. These countries were chosen by the
55.5% of all the students who chose to study abroad. The latter means that 1 out of 2 students
preferred a country among the top five destinations, where Spain reaches the first position.
On the other side, the students which chose to leave their country1
have almost the
same preferences in regard with the host countries. However, instead of Italy, in the fifth
place is Poland and Italy follows closely on sixth. In addition, we observed the same pattern
in preferences, meaning that the 59.5% of the total number of students who left their
countries belongs to the top five countries.
 Gender, age and study level.
According to our data, the females prefer to participate more in the exchange program
than the men. That is to say, the percentage of the total participants who are women varies
around 60% each year2
. In addition, the most students chose to participate when they were at
the age of 21; number which changes only for the period 2008-2009 (23 years old).
Moreover, the majority of participants studied at the first cycle of their studies when they
chose to participate in the program.
 Subject area and taught language.
The educational background of the majority of the participants belongs to the study
areas of business and administration, humanities and foreign languages. Notably, the latter
gathers a vast number of students and are significant larger than the other categories. As a
result, we can conclude that a large fraction of the overall students chose to participate aiming
to a better understanding of a foreign language; as a supplement to their studies. This
conclusion is particular interesting as we can see that the students chose the program as a
mean to learn a language better and thus, a new culture. The latter, also, is one of the main
reasons that the ERASMUS program has been established among the European states.
Furthermore, we can see, unsurprisingly that the main language which dominates
among the taught courses is English. In addition, the native languages from the top preferable
countries complete the top taught languages by the university courses. However, the picture
gets interesting when we look throughout the years. The percentage of English language
accounts for 45,4% in 2008 and reaches 58% by the end of the first semester of 2014. In
particular, the latter result quite contradicts our above observations. Firstly, the majority of
ongoing university studies belong to foreign language area, a fraction that becomes bigger as
the years pass. At the same time, however, the percentage of courses being taught in English
also rises, in a great degree. Secondly, we can see that the top non-English countries have a
steady slightly increasing number of students throughout the years. Although, we can find
their native languages on the top list, the percentage of English taught courses remains large.
Given these points, one can conclude that the success of ERASMUS program (attracting a
great number of students every year), gave an incentive to the universities in non-English
countries to add English taught courses.
Part 2
In this part, we will try to examine the reasons under the participants' preferences
regarding the countries which they chose to travel and study. Henceforth, we will compare
the total number of receiving students for each country with three different facts of the
country; its criminal rates, its governments' expenditure in tertiary education and its……….
 Criminality
Our data set contains the number of total suspected people who were arrested in one
of the European states, regardless their nationality. While we aggregate the data, we can see
that the biggest number of total suspected people belongs to Germany (Graph 3 in Appendix).
Meanwhile, France and Italy also, belong to the first five positions. As it is known, the
criminal rates are considered to be a negative factor for the country's popularity towards the
visitors. Together with our findings from the first part, it seems strange to see three of the
most popular countries for students, also having high criminality. In addition, when we plot
the average number of students against the average number of criminals, we observed a
slightly positive relationship
Despite the latter surprising result, we plot the average number of students against the
average number of criminals (see Figure 1 in Appendix). Given the scatter plot, we cannot
establish any relationship between the variables. Moreover, when we see the rates of
criminality per hundred thousand inhabitants, we see slightly different picture. Except
Finland, the rates are almost the same among the most popular destinations for the students.
As well as, the correlation of incoming students and number of criminals for these countries
is very low (lower than 0.5). Under these two conclusions, we can state that the students were
not concerned about the criminal rates when they were choosing for an ERASMUS
destination. In other words, the criminality of a country did not affect the students' choice to
study there.
 Public expenditure in tertiary education
For this section, our data represent each country's public expenditure in tertiary
education as a percentage of its gross domestic product (GDP). To begin with, we observe
that the Scandinavian countries spend the biggest percentage in education. At the same time,
these countries have least or the same popularity with countries which spend less or much
less. Overall, looking the scatter plot (Figure 2 ), we are unable to establish any correlation
with the percentage of GDP spent in education and the popularity as a ERASMUS
destination.
Furthermore, we can look closely into the first five popular destinations. Firstly, we
can exclude the years 2008 and 2014 as we have data for only one semester. Secondly, we
can observe a decline in expenditure on the year 2012 for all countries expect England.
Markedly, it reflects the economic depression that the governments' decisions against it. The
main result from the scatter plots and the linear regressions (see Figure 3,4,5,6,7 in
Appendix) is that of a positive relationship. To be more specific, we can establish that the
public expenditure in education can positively affect the number of ERASMUS students in a
country. However, the relationship is slightly positive and we can some occasions that there
is not a positive relationship.
Conclusion
In a final analysis, we saw that the students' mobility has being characterized by some specific
patterns throughout the time frame of 2008-2014. For instance, the participants' preferences vary very
little between the years regarding both the home and the host countries. Notably, there are five
countries that dominate in numbers in both categories and hold more than the 50% of the total number
of participants. These countries are Spain, France, Germany, England and Italy (or Poland).
Moreover, we observed that the majority of participants studied towards a degree concerning foreign
languages. Hence, we can say that the program is a tool for those students to better their studies. In the
same time, we saw that the percentage of English taught courses increased rapidly from 2008.
Additionally, we tried to explain if criminality or public expenditure in tertiary education is a
factor that affects the participants to choose a country as their study destination. In general, we would
have expected that the first will have a negative effect, while the latter a positive. However, we found
strong evidences that the criminality is not a factor that the students are taking into consideration
when they choose their destination. On the other hand, we found a slightly positive relationship
between the number of incoming students and the percentage of public expenditure in education.
Nonetheless, high expenditure in education does not mean large number of students as we saw in the
case of Scandinavian countries.
As can be seen, our analysis concerns a small time frame where is difficult to draw strong
conclusions about the ERASMUS mobility. Likewise, a further more analytical analysis for the
students' mobility needs to take place in order to justify our findings in that paper.
Appendix
Code
#extracting the students mobility by year
library(tidyverse)
library(dplyr)
library(readr)
library(lubridate)
library(ggplot2)
stdata0809 <- read_delim("M:/pc/Desktop/student_data_2008 (1).csv",
";", escape_double = FALSE, trim_ws = TRUE)
glimpse(stdata0809)
data0809<-stdata0809 %>%
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,LENGTHWORKPLACEMENT,TYPEWORKSECTO
R,CONSORTIUMAGREEMENTNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDIT
SWORK,WORKPLACEMENT,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTIO
N,HOSTINSTITUTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARA
TION,TAUGHTHOSTLANG)) %>%
filter(MOBILITYTYPE != "P") %>%
filter(!is.na(STUDYSTARTDATE)) %>%
separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR"))
glimpse(data0809)
rm(stdata0809)
sum(is.na(data0809$YEAR))
sum(data0809$COUNTRYOFHOSTINSTITUTION==0)
data0809<- data0809 %>%
filter(COUNTRYOFHOSTINSTITUTION !=0) %>%
filter(!is.na(YEAR)) %>%
select(-c(MONTH))
#separate to 2008 and 2009
#2008
data08<-data0809 %>%
filter(YEAR=="2008")
#2009
data09<-data0809 %>%
filter(YEAR=="2009")
#summary,host
#host08<-data08 %>%
#count(COUNTRYOFHOSTINSTITUTION,sort = TRUE, name="GOING.TO")
host08<-data08 %>%
count(COUNTRYOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host08)
names(host08)[names(host08)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY"
names(host08)[names(host08)=="n"]<-"GOING.TO"
#summary,home
#home08<- data08 %>%
#count(COUNTRYOFHOMEINSTITUTION,sort = TRUE, name="LEAVING.FROM" )
home08<- data08 %>%
count(COUNTRYOFHOMEINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(home08)
names(home08)[names(home08)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY"
names(home08)[names(home08)=="n"]<-"LEAVING.FROM"
#dif no of obs
anti_join(host08,home08,by="COUNTRY")
home08<- home08 %>%
add_row(COUNTRY="FR",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0) %>%
add_row(COUNTRY="CH",LEAVING.FROM=0)
anti_join(home8,host08,by="COUNTRY")
#TOTAL 2008 s2
#ADD YEAR COLUMN + row for HR
#PLOT TO SEE HOW IT LOOKS
#ggplot(to08,aes(x=YEAR,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text()
total08<-inner_join(host08,home08,by="COUNTRY")
dt08<-as.Date("2008-08-01")
library(zoo)
total08<-total08 %>%
add_row(COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0) %>%
add_column(DATE=dt08,.before = "COUNTRY")
Date2period <- function(dt08, period = 6, sep = " S") {
ym<- as.yearmon(dt08)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
#general function
#Date2period <- function(x, period = 6, sep = " S") {
#ym<- as.yearmon(x)
# paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
#}
total08$DATE <- Date2period(total08$DATE)
#the the second part
TOTAL08<-left_join(host08,home08,by="COUNTRY")
TOTAL08<-TOTAL08 %>%
add_row(COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0) %>%
add_row(COUNTRY = "CH", LEAVING.FROM = 0,GOING.TO=0)
#2009a
#summary,host
host09<-data09 %>%
count(COUNTRYOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host09)
names(host09)[names(host09)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY"
names(host09)[names(host09)=="n"]<-"GOING.TO"
#summary,home
home09<- data09 %>%
count(COUNTRYOFHOMEINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(home09)
names(home09)[names(home09)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY"
names(home09)[names(home09)=="n"]<-"LEAVING.FROM"
anti_join(host09,home09,by="COUNTRY")
home09<- home09 %>%
add_row(COUNTRY="FR",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0)
#TOTAL 2009a
#ADD YEAR COLUMN
#MERGE
TOTAL09A<-inner_join(home09,host09,by="COUNTRY")
TOTAL09A<-add_row(TOTAL09A,COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0)
dt09a<-as.Date("2009-02-01")
total09a<- TOTAL09A %>%
add_column(DATE=dt09a,.before = "COUNTRY")
Date2period <- function(dt09a, period = 6, sep = " S") {
ym<- as.yearmon(dt09a)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total09a$DATE <- Date2period(total09a$DATE)
#test how ggplot looks
#total<-bind_rows(total08,total09a)
#ggplot(total,aes(x=DATE,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text()
#ggplot(total09,aes(x=DATE,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text()
#keep it clean
rm(data08,data09,home08,home09,host09,host08)
#IMPORTING THE NEXT FILE 2009-2010
stdata0910 <- read_delim("M:/pc/Desktop/student_data_2009.csv",
";", escape_double = FALSE, trim_ws = TRUE)
glimpse(stdata0910)
data0910<-stdata0910 %>%
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,LENGTHWORKPLACEMENT,TYPEWORKSECTO
R,CONSORTIUMAGREEMENTNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDIT
SWORK,WORKPLACEMENT,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTIO
N,HOSTINSTITUTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARA
TION,TAUGHTHOSTLANG)) %>%
filter(MOBILITYTYPE != "P") %>%
filter(!is.na(STUDYSTARTDATE)) %>%
separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR"))
glimpse(data0910)
rm(stdata0910)
sum(is.na(data0910$YEAR))
sum(data0910$COUNTRYOFHOSTINSTITUTION==0)
data0910<- data0910 %>%
filter(COUNTRYOFHOSTINSTITUTION !=0) %>%
filter(!is.na(YEAR)) %>%
select(-c(MONTH))
#separate to 2010 and 2009
#2010
data10<-data0910 %>%
filter(YEAR=="2010")
#2009
data09<-data0910 %>%
filter(YEAR=="2009")
#summary,host
host10<-data10 %>%
count(COUNTRYOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host10)
names(host10)[names(host10)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY"
names(host10)[names(host10)=="n"]<-"GOING.TO"
#summary,home
home10<- data10 %>%
count(COUNTRYOFHOMEINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(home10)
names(home10)[names(home10)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY"
names(home10)[names(home10)=="n"]<-"LEAVING.FROM"
anti_join(host10,home10,by="COUNTRY")
sum(data0910$COUNTRYOFHOMEINSTITUTION=="SE")
anti_join(home10,host10,by="COUNTRY")
home10<- home10 %>%
add_row(COUNTRY="SE",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0) %>%
add_row(COUNTRY="EE",LEAVING.FROM=0)
host10<- host10%>%
add_row(COUNTRY="HR",GOING.TO=0)
# s o s
#We see that the HR country only send students and not receiving
#home10$COUNTRY[!(home10$COUNTRY %in% host10$COUNTRY)]
#TOTAL 2010
#ADD YEAR COLUMN
#MERGE
TOTAL10A<-inner_join(home10,host10,by="COUNTRY")
dt10a<-as.Date("2010-02-01")
total10a<- TOTAL10A %>%
add_column(DATE=dt10a,.before = "COUNTRY")
Date2period <- function(dt10a, period = 6, sep = " S") {
ym<- as.yearmon(dt10a)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total10a$DATE <- Date2period(total10a$DATE)
str(total10a)
#2009 the same as before
#summary,host
host09<-data09 %>%
count(COUNTRYOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host09)
names(host09)[names(host09)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY"
names(host09)[names(host09)=="n"]<-"GOING.TO"
#summary,home
home09<- data09 %>%
count(COUNTRYOFHOMEINSTITUTION,sort = TRUE )
#rename to COUNTRY
names(home09)
names(home09)[names(home09)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY"
names(home09)[names(home09)=="n"]<-"LEAVING.FROM"
host09$COUNTRY[!(host09$COUNTRY %in% home09$COUNTRY)]
home09$COUNTRY[!(home09$COUNTRY %in% host09$COUNTRY)]
home09<- home09 %>%
add_row(COUNTRY="SE",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0) %>%
add_row(COUNTRY="EE",LEAVING.FROM=0) %>%
add_row(COUNTRY="CH",LEAVING.FROM=0)
host09<- host09%>%
add_row(COUNTRY="HR",GOING.TO=0) %>%
add_row(COUNTRY="cH",GOING.TO=0)
#TOTAL 2009
#MERGE
TOTAL09B<-left_join(home09,host09,by="COUNTRY")
dt09b<-as.Date("2009-09-01")
total09b<- TOTAL09B %>%
add_column(DATE=dt09b,.before = "COUNTRY")
Date2period <- function(dt09b, period = 6, sep = " S") {
ym<- as.yearmon(dt09b)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total09b$DATE <- Date2period(total09b$DATE)
#part 2
TOTAL09<- left_join(TOTAL09B,TOTAL09A, by="COUNTRY")
TOTAL09[is.na(TOTAL09)]<-0
TOTAL09<- TOTAL09 %>%
group_by(COUNTRY) %>%
transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y,
GOING.TO=GOING.TO.x+GOING.TO.y)
#CLEAN
rm(home10,host10,home09,host09,data09,data10)
#import the next file 2010-2011
stdata1011 <- read_delim("M:/pc/Desktop/student_data_2010 (1).csv",
";", escape_double = FALSE, trim_ws = TRUE)
glimpse(stdata1011)
names(stdata1011)
sum(is.na(stdata1011$STUDYSTARTDATE))
data1011<-stdata1011 %>%
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,TYPEWORKSECTOR,CONSORTIUMAGREEMEN
TNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDITSWORK,WORKPLACEMENT,
ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTION,HOSTINSTITUTION,QUALI
FICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARATION,TAUGHTHOSTLANG))
%>%
filter(MOBILITYTYPE != "P") %>%
filter(!is.na(STUDYSTARTDATE)) %>%
separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR"))
sum(is.na(data1011$YEAR))
sum(data1011$COUNTRYCODEOFHOSTINSTITUTION==0)
#BEFR,BENL referrring to BE
data1011$COUNTRYCODEOFHOSTINSTITUTION[data1011$COUNTRYCODEOFHOSTINSTITUTION=="B
EFR"]<-"BE"
data1011$COUNTRYCODEOFHOSTINSTITUTION[data1011$COUNTRYCODEOFHOSTINSTITUTION=="B
ENL"]<-"BE"
data1011$COUNTRYCODEOFHOMEINSTITUTION[data1011$COUNTRYCODEOFHOMEINSTITUTION==
"BEFR"]<-"BE"
data1011$COUNTRYCODEOFHOMEINSTITUTION[data1011$COUNTRYCODEOFHOMEINSTITUTION==
"BENL"]<-"BE"
glimpse(data1011)
data1011<- data1011 %>%
filter(!is.na(YEAR)) %>%
select(-c(MONTH))
#separate to 2010 and 2011
#2010
data10<-data1011 %>%
filter(YEAR=="2010")
#2009
data11<-data1011 %>%
filter(YEAR=="2011")
#summary,host
host10<-data10 %>%
count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host10)
names(host10)[names(host10)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY"
names(host10)[names(host10)=="n"]<-"GOING.TO"
#summary,home
home10<- data10 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(home10)
names(home10)[names(home10)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY"
names(home10)[names(home10)=="n"]<-"LEAVING.FROM"
anti_join(host10,home10,by="COUNTRY")
anti_join(home10,host10,by="COUNTRY")
home10<- home10 %>%
add_row(COUNTRY="EE",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0) %>%
add_row(COUNTRY="MT",LEAVING.FROM=0)%>%
add_row(COUNTRY="CH",LEAVING.FROM=0)
#TOTAL 2010
#ADD YEAR COLUMN
#MERGE
TOTAL10B<-inner_join(home10,host10,by="COUNTRY")
TOTAL10B$COUNTRY[!(TOTAL10B$COUNTRY %in% TOTAL10A$COUNTRY)]
TOTAL10A<- TOTAL10A %>%
add_row(COUNTRY="CH",GOING.TO=0,LEAVING.FROM=0)
dt10b<-as.Date("2010-09-01")
total10b<- TOTAL10B %>%
add_column(DATE=dt10b,.before = "COUNTRY")
Date2period <- function(dt10b, period = 6, sep = " S") {
ym<- as.yearmon(dt10b)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total10b$DATE <- Date2period(total10b$DATE)
#part 2
TOTAL10<- inner_join(TOTAL10A,TOTAL10B, by="COUNTRY")
TOTAL10<- TOTAL10 %>%
group_by(COUNTRY) %>%
transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y,
GOING.TO=GOING.TO.x+GOING.TO.y)
#2011
#summary,host
host11<-data11 %>%
count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host11)
names(host11)[names(host11)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY"
names(host11)[names(host11)=="n"]<-"GOING.TO"
#summary,home
home11<- data11 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE )
#rename to COUNTRY
names(home11)
names(home11)[names(home11)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY"
names(home11)[names(home11)=="n"]<-"LEAVING.FROM"
host11$COUNTRY[!(host11$COUNTRY %in% home11$COUNTRY)]
home11$COUNTRY[!(home11$COUNTRY %in% host11$COUNTRY)]
home11<- home11 %>%
add_row(COUNTRY="CH",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0) %>%
add_row(COUNTRY="EE",LEAVING.FROM=0) %>%
add_row(COUNTRY="MT",LEAVING.FROM=0)
#TOTAL 2011
#MERGE
TOTAL11A<-inner_join(home11,host11,by="COUNTRY")
dt11a<-as.Date("2011-02-01")
total11a<- TOTAL11A %>%
add_column(DATE=dt11a,.before = "COUNTRY")
Date2period <- function(dt11a, period = 6, sep = " S") {
ym<- as.yearmon(dt11a)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total11a$DATE <- Date2period(total11a$DATE)
#REMOVE
rm(data11,home11,host11,stdata1011)
rm(data10,home10,host10)
#2011-2012
stdata1112 <- read_delim("//hume/student-u02/marthoma/pc/Desktop/student_1112.csv",
";", escape_double = FALSE, trim_ws = TRUE)
glimpse(stdata1112)
names(stdata1112)
sum(is.na(stdata1112$STUDYSTARTDATE))
data1112<-stdata1112 %>%
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,PLACEMENTENTERPRISE,COUNTRYOFPLACEM
ENT,ECTSCREDITSPLACEMENT,LENGTHPLACEMENT,TYPEPLACEMENTSECTOR,CONSORTIUMAGREEM
ENTNUMBER,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTION,HOSTINSTIT
UTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARATION,TAUGHT
HOSTLANG,SNSUPPLEMENT)) %>%
filter(MOBILITYTYPE != "P") %>%
filter(!is.na(STUDYSTARTDATE)) %>%
separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR"))
sum(is.na(data1112$YEAR))
sum(data1112$COUNTRYCODEOFHOSTINSTITUTION==0)
#BEFR,BENL referrring to BE
data1112$COUNTRYCODEOFHOSTINSTITUTION[data1112$COUNTRYCODEOFHOSTINSTITUTION=="B
EFR"]<-"BE"
data1112$COUNTRYCODEOFHOSTINSTITUTION[data1112$COUNTRYCODEOFHOSTINSTITUTION=="B
ENL"]<-"BE"
data1112$COUNTRYCODEOFHOSTINSTITUTION[data1112$COUNTRYCODEOFHOSTINSTITUTION=="B
EDE"]<-"BE"
data1112$COUNTRYCODEOFHOMEINSTITUTION[data1112$COUNTRYCODEOFHOMEINSTITUTION==
"BEFR"]<-"BE"
data1112$COUNTRYCODEOFHOMEINSTITUTION[data1112$COUNTRYCODEOFHOMEINSTITUTION==
"BENL"]<-"BE"
glimpse(data1112)
data1112<- data1112 %>%
select(-c(MONTH))
#separate to 2011 and 2012
#2012
data12<-data1112 %>%
filter(YEAR=="12")
#2011
data11<-data1112 %>%
filter(YEAR=="11")
#summary,host
host12<-data12 %>%
count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host12)
names(host12)[names(host12)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY"
names(host12)[names(host12)=="n"]<-"GOING.TO"
#summary,home
home12<- data12 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(home12)
names(home12)[names(home12)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY"
names(home12)[names(home12)=="n"]<-"LEAVING.FROM"
anti_join(host12,home12,by="COUNTRY")
#TOTAL 2012A
#MERGE
TOTAL12A<-inner_join(home12,host12,by="COUNTRY")
dt12a<-as.Date("2012-02-01")
total12a<- TOTAL12A %>%
add_column(DATE=dt12a,.before = "COUNTRY")
Date2period <- function(dt12a, period = 6, sep = " S") {
ym<- as.yearmon(dt12a)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total12a$DATE <- Date2period(total12a$DATE)
#2011B
#summary,host
host11<-data11 %>%
count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host11)
names(host11)[names(host11)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY"
names(host11)[names(host11)=="n"]<-"GOING.TO"
#summary,home
home11<- data11 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE )
#rename to COUNTRY
names(home11)
names(home11)[names(home11)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY"
names(home11)[names(home11)=="n"]<-"LEAVING.FROM"
#TOTAL 2011
#MERGE
TOTAL11B<-inner_join(home11,host11,by="COUNTRY")
dt11b<-as.Date("2011-09-01")
total11b<- TOTAL11B %>%
add_column(DATE=dt11b,.before = "COUNTRY")
Date2period <- function(dt11b, period = 6, sep = " S") {
ym<- as.yearmon(dt11b)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total11b$DATE <- Date2period(total11b$DATE)
#part2
TOTAL11<- inner_join(TOTAL11A,TOTAL11B, by="COUNTRY")
TOTAL11<- TOTAL11 %>%
group_by(COUNTRY) %>%
transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y,
GOING.TO=GOING.TO.x+GOING.TO.y)
#REMOVE
rm(data12,data11,home11,host11,home12,host12,stdata1112)
#2012-2013
stdata1213 <- read_delim("M:/pc/Desktop/SM_2012_13_20141103_01 (2).csv",
";", escape_double = FALSE, trim_ws = TRUE)
glimpse(stdata1213)
names(stdata1213)
sum(is.na(stdata1213$STUDY_START_DATE))
data1213<-stdata1213 %>%
select(-
c(STUDENT_ID,TYPE_PLACEMENT_SECTOR_VALUE,ECTS_CREDITS_PLACEMENT_AMT,NUMB_YRS_HI
GHER_EDUCAT_VALUE,PLACEMENT_ENTERPRISE_VALUE,TOTAL_ECTS_CREDITS_AMT,
QUALIFICATION_AT_HOST_CDE,STUDENT_NATIONALITY_CDE,ID_MOBILITY_CDE,PLACEMENT_ENTE
RPRISE_CTRY_CDE,PLACEMENT_ENTERPRISE_CTRY_CDE,LENGTH_PLACEMENT_VALUE,
CONSORTIUM_AGREEMENT_NUMBER,SPECIAL_NEEDS_SUPPLEMENT_VALUE,STUDY_GRANT_AMT,
HOST_INSTITUTION_CDE,HOME_INSTITUTION_CDE,PLACEMENT_ENTERPRISE_SIZE_CDE,
SHORT_DURATION_CDE,ECTS_CREDITS_STUDY_AMT,PLACEMENT_GRANT_AMT)) %>%
filter(MOBILITY_TYPE_CDE != "P") %>%
filter(!is.na(STUDY_START_DATE)) %>%
separate(STUDY_START_DATE,sep="-",into=c("MONTH","YEAR"))
sum(is.na(data1213$YEAR))
sum(data1213$HOST_INSTITUTION_COUNTRY_CDE==0)
#BEFR,BENL referrring to BE
data1213$HOST_INSTITUTION_COUNTRY_CDE[data1213$HOST_INSTITUTION_COUNTRY_CDE=="BE
FR"]<-"BE"
data1213$HOST_INSTITUTION_COUNTRY_CDE[data1213$HOST_INSTITUTION_COUNTRY_CDE=="BE
NL"]<-"BE"
data1213$HOST_INSTITUTION_COUNTRY_CDE[data1213$HOST_INSTITUTION_COUNTRY_CDE=="BE
DE"]<-"BE"
data1213$HOME_INSTITUTION_CTRY_CDE[data1213$HOME_INSTITUTION_CTRY_CDE=="BEFR"]<-
"BE"
data1213$HOME_INSTITUTION_CTRY_CDE[data1213$HOME_INSTITUTION_CTRY_CDE=="BENL"]<-
"BE"
data1213<- data1213 %>%
filter(!is.na(YEAR)) %>%
select(-c(MONTH))
#separate to 2013 and 2012
#2012
data12<-data1213 %>%
filter(YEAR=="12")
#2013
data13<-data1213 %>%
filter(YEAR=="13")
#summary host 2013
host13<-data13 %>%
count(HOST_INSTITUTION_COUNTRY_CDE,sort = TRUE)
#rename to COUNTRY
names(host13)
names(host13)[names(host13)=="HOST_INSTITUTION_COUNTRY_CDE"]<-"COUNTRY"
names(host13)[names(host13)=="n"]<-"GOING.TO"
#summary,home
home13<- data13 %>%
count(HOME_INSTITUTION_CTRY_CDE,sort = TRUE)
#rename to COUNTRY
names(home13)
names(home13)[names(home13)=="HOME_INSTITUTION_CTRY_CDE"]<-"COUNTRY"
names(home13)[names(home13)=="n"]<-"LEAVING.FROM"
#2013A
#MERGE
TOTAL13A<-inner_join(home13,host13,by="COUNTRY")
dt13a<-as.Date("2013-02-01")
total13a<- TOTAL13A %>%
add_column(DATE=dt13a,.before = "COUNTRY")
Date2period <- function(dt13a, period = 6, sep = " S") {
ym<- as.yearmon(dt13a)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total13a$DATE <- Date2period(total13a$DATE)
#summary,host
host12<-data12 %>%
count(HOST_INSTITUTION_COUNTRY_CDE,sort = TRUE)
#rename to COUNTRY
names(host12)
names(host12)[names(host12)=="HOST_INSTITUTION_COUNTRY_CDE"]<-"COUNTRY"
names(host12)[names(host12)=="n"]<-"GOING.TO"
#summary,home
home12<- data12 %>%
count(HOME_INSTITUTION_CTRY_CDE,sort = TRUE )
#rename to COUNTRY
names(home12)
names(home12)[names(home12)=="HOME_INSTITUTION_CTRY_CDE"]<-"COUNTRY"
names(home12)[names(home12)=="n"]<-"LEAVING.FROM"
#TOTAL 2012
#MERGE
TOTAL12B<-inner_join(home12,host12,by="COUNTRY")
dt12b<-as.Date("2012-09-01")
total12b<- TOTAL12B %>%
add_column(DATE=dt12b,.before = "COUNTRY")
Date2period <- function(dt12b, period = 6, sep = " S") {
ym<- as.yearmon(dt12b)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total12b$DATE <- Date2period(total12b$DATE)
#part2
TOTAL12<- inner_join(TOTAL12A,TOTAL12B, by="COUNTRY")
TOTAL12<- TOTAL12 %>%
group_by(COUNTRY) %>%
transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y,
GOING.TO=GOING.TO.x+GOING.TO.y)
#clean
rm(data13,data12,stdata1213,home12,home13,host12,host13)
#2013-2014
library(readxl)
stdata1314<-read_excel("M:/pc/Desktop/Student_Mobility_2013-14.xlsx")
#changing the date character into date format
stdata1314$StartDate<-dmy_hms(stdata1314$StartDate,tz=Sys.timezone())
glimpse(stdata1314)
names(stdata1314)
sum(is.na(stdata1314$StartDate))
sum(stdata1314$ReceivingCountry=="BEFR")
sum(stdata1314$ReceivingCountry=="BENL")
sum(stdata1314$ReceivingCountry=="BEDE")
sum(stdata1314$SendingCountry=="BEFR")
data1314<- stdata1314 %>%
filter(CombinedMobilityYesNo=="NO") %>%
filter(MobilityType == "Mob-SMS") %>%
separate(StartDate,sep="-",into=c("year","month","day")) %>%
select(-
c(Action,CombinedMobilityYesNo,ProjectNumber,SpecialNeeds,EndDate,ParticipantID,SendingPartn
erName,
MobiilityID,DurationInMonths,HostingPartnerErasmusID,DurationInDays,ParticipantType,HostingPar
tnerName,
SubsistenceTravel,CallYear,SendingPartnerErasmusID,HostingPartnerCity))
data1314$ReceivingCountry[data1314$ReceivingCountry=="GB"]<-"UK"
data1314$SendingCountry[data1314$SendingCountry=="GB"]<-"UK"
data1314<- data1314 %>%
filter(!is.na(year)) %>%
select(-c(month,day))
#separate to 2013 and 2014
#2014
data14<-data1314 %>%
filter(year=="2014")
#2013
data13<-data1314 %>%
filter(year=="2013")
#summary host 2013B
host13<-data13 %>%
count(ReceivingCountry,sort = TRUE)
#rename to COUNTRY
names(host13)
names(host13)[names(host13)=="ReceivingCountry"]<-"COUNTRY"
names(host13)[names(host13)=="n"]<-"GOING.TO"
#summary,home
home13<- data13 %>%
count(SendingCountry,sort = TRUE)
#rename to COUNTRY
names(home13)
names(home13)[names(home13)=="SendingCountry"]<-"COUNTRY"
names(home13)[names(home13)=="n"]<-"LEAVING.FROM"
#2013B
#MERGE
TOTAL13B<-inner_join(home13,host13,by="COUNTRY")
dt13b<-as.Date("2013-09-01")
total13b<- TOTAL13B %>%
add_column(DATE=dt13b,.before = "COUNTRY")
Date2period <- function(dt13b, period = 6, sep = " S") {
ym<- as.yearmon(dt13b)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total13b$DATE <- Date2period(total13b$DATE)
#part 2
TOTAL13B$COUNTRY[TOTAL13B$COUNTRY=="GB"]<-"UK"
TOTAL13<- inner_join(TOTAL13A,TOTAL13B, by="COUNTRY")
TOTAL13<- TOTAL13 %>%
group_by(COUNTRY) %>%
transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y,
GOING.TO=GOING.TO.x+GOING.TO.y)
inner_join(TOTAL09,TOTAL13)
#2014
#summary,host
host14<-data14 %>%
count(ReceivingCountry,sort = TRUE)
#rename to COUNTRY
names(host14)
names(host14)[names(host14)=="ReceivingCountry"]<-"COUNTRY"
names(host14)[names(host14)=="n"]<-"GOING.TO"
#summary,home
home14<- data14 %>%
count(SendingCountry,sort = TRUE )
#rename to COUNTRY
names(home14)
names(home14)[names(home14)=="SendingCountry"]<-"COUNTRY"
names(home14)[names(home14)=="n"]<-"LEAVING.FROM"
#TOTAL 2012
#MERGE
TOTAL14<-inner_join(home14,host14,by="COUNTRY")
dt14<-as.Date("2014-02-01")
total14<- TOTAL14 %>%
add_column(DATE=dt14,.before = "COUNTRY")
Date2period <- function(dt14, period = 6, sep = " S") {
ym<- as.yearmon(dt14)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
total14$DATE <- Date2period(total14$DATE)
rm(home13,home14,host13,host14,TOTAL13A,TOTAL13B,stdata1314,data13,data14,data1314,TOTA
L14)
#merge all
no_students_0814<-bind_rows(total08,total09a,total09b,total10a,total10b,total11a,
total11b,total12a,total12b,total13a,total13b,total14)
#for part two
#starting plot the variables
TOTAL08<-add_column(TOTAL08, DATE="2008",.before = "COUNTRY")
TOTAL09<-add_column(TOTAL09, DATE="2009",.before = "COUNTRY")
TOTAL10<-add_column(TOTAL10, DATE="2010",.before = "COUNTRY")
TOTAL11<-add_column(TOTAL11, DATE="2011",.before = "COUNTRY")
TOTAL12<-add_column(TOTAL12, DATE="2012",.before = "COUNTRY")
TOTAL13<-add_column(TOTAL13, DATE="2013",.before = "COUNTRY")
TOTAL14<-add_column(TOTAL14, DATE="2014",.before = "COUNTRY")
TOTAL<-bind_rows(TOTAL08,TOTAL09,TOTAL10,TOTAL11,TOTAL12,TOTAL13,TOTAL14)
ggplot(no_students_0814,aes(x=DATE,y=LEAVING.FROM,color=COUNTRY,label=COUNTRY,size=LEAV
ING.FROM),group=COUNTRY) + geom_point(aes(color=COUNTRY))
rm(total08,
total09a,total09b,total10a,total10b,total11a,total11b,total12a,total12b,total13a,total13b,total14)
rm(dt08,dt09a,dt09b,dt10a,dt10b,dt11a,dt11b,dt12a,dt12b,dt13a,dt13b,dt14,Date2period)
#PART 1
#the top countries who send students
#topleaving<- arrange(no_students_0814,desc(LEAVING.FROM))
#top receiving students
#topaccepting<- arrange(no_students_0814,desc(GOING.TO))
#top4
#top4<- filter(no_students_0814,COUNTRY %in% c("ES","DE","IT","FR"))
#sum
sumall<-no_students_0814%>%
group_by(COUNTRY) %>%
summarise_at(vars(LEAVING.FROM,GOING.TO),funs(sum,mean))
totalno_all<-sum(sumall$LEAVING.FROM_sum,sumall$GOING.TO_sum)
#bar graph
topsumall<- sumall %>%
filter(GOING.TO_mean>2500)
ggplot(topsumall,aes(x=COUNTRY,y=GOING.TO_mean))+geom_col()+theme_classic() + labs(title
="Graph 1",x="Host Countries ",y="Average number of incoming students (greater than 2500)")
topsumall<- sumall %>%
filter(LEAVING.FROM_mean>1500)
ggplot(topsumall,aes(x=COUNTRY,y=LEAVING.FROM_mean))+geom_col()+theme_classic() +
labs(title ="Graph 2",x="Home Countries ",y="Average number of leaving students (greater than
1500)")
glimpse(sumall)
#focus on going.to
sumgoing<- select(sumall,-c("LEAVING.FROM_sum","LEAVING.FROM_mean"))
sumgoing<-arrange(sumgoing,desc(GOING.TO_sum))
topsum_going<-filter(sumgoing,COUNTRY %in% c("ES","DE","IT","FR","UK"))
#bar graph
ggplot(avgcrimes_top,aes(x=COUNTRY,y=avg.criminals))+geom_col()+theme_classic() + labs(title
="Graph 3",x="Country (top 14)",y="Average number of suspected")
#finding fraction
totalno_going<-c(sum(sumgoing$GOING.TO_sum))
totalno_topgoing<-sum(topsum_going$GOING.TO_sum)
fraction_going<-(totalno_topgoing/totalno_going) * 100
#focus on leaving
sumleaving<-sumall%>%
group_by(COUNTRY) %>%
select(c("LEAVING.FROM_sum","LEAVING.FROM_mean")) %>%
arrange(desc(LEAVING.FROM_sum))
topsum_leaving<-filter(sumleaving,COUNTRY %in% c("ES","DE","IT","FR","PL"))
#FRACTION LEAVING
totalno_leaving<-sum(sumleaving$LEAVING.FROM_sum)
totalno_topleaving<-sum(topsum_leaving$LEAVING.FROM_sum)
fraction_leaving<-(totalno_topleaving/totalno_leaving)*100
#ggplot
#the same for the leaving
#focusing on years
#ggplot the top five with the TOTAL
#GENDER
gender0809<- data0809 %>%
count(SEX, name = "total_08-09")
gender0910<- data0910 %>%
count(GENDER,name = "total_09-10")
#fraction0910
sumsex0910<- gender0910[1,2]+gender0910[2,2]
fractionfemale0910<- (gender0910[1,2]/sumsex0910)*100
gender1011<- data1011 %>%
count(GENDER,name = "total_10-11")
gender1112<- data1112 %>%
count(GENDER,name = "total_11-12")
gender1213<- data1213 %>%
count(STUDENT_GENDER_CDE,name = "total_12-13")
gender1314<- data1314 %>%
count(ParticipantGender,name = "total_13-14")
sumsex1314<-gender1314[1,2]+gender1314[2,2]
fractionsemale13314<-(gender1314[1,2]/sumsex1314)*100
#find the mode function
calculate_mode<-function(x) {
uniqx<-unique(na.omit(x))
uniqx[which.max(tabulate(match(x,uniqx)))]
}
#mode_age
calculate_mode(data0809$AGE)
calculate_mode(data0910$AGE)
calculate_mode(data1011$AGE)
calculate_mode(data1112$AGE)
calculate_mode(data1213$STUDENT_AGE_VALUE)
#mode_studylevel
calculate_mode(data0809$LEVELSTUDY)
calculate_mode(data0910$LEVELSTUDY)
calculate_mode(data1011$LEVELSTUDY)
calculate_mode(data1112$LEVELSTUDY)
calculate_mode(data1213$STUDENT_STUDY_LEVEL_CDE)
#subject area
area0809<- data0809 %>%
count(SUBJECTAREA,name = "total_08-09")
area0910<- data0910 %>%
count(SUBJECTAREA,name = "total_09-10")
area1011<- data1011 %>%
count(SUBJECTAREA,name = "total_10-11")
area1112<- data1011 %>%
count(SUBJECTAREA,name = "total_11-12")
area1213<- data1213 %>%
count(STUDENT_SUBJECT_AREA_VALUE,name = "total_12-13")
area1314<- data1314 %>%
count(SubjectAreaCode,name = "total_13-14")
sub222<-filter(data1314,SubjectAreaCode=="222")
sub340<-filter(data1314,SubjectAreaCode=="340")
sub22<-filter(data1314,SubjectAreaCode=="22")
sub34<-filter(data1314,SubjectAreaCode=="34")
#language taught
lang0809<- data0809 %>%
count(LANGUAGETAUGHT,name = "total_08-09")
lang0910<- data0910 %>%
count(LANGUAGETAUGHT,name = "total_09-10")
lang1011<- data1011 %>%
count(LANGUAGETAUGHT,name = "total_10-11")
lang1112<- data1011 %>%
count(LANGUAGETAUGHT,name = "total_11-12")
lang1213<- data1213 %>%
count(LANGUAGE_TAUGHT_CDE,name = "total_12-13")
lang1314<- data1314 %>%
count(Language,name = "total_13-14")
#fraction
english0910<-lang0910[6,2]
totallang0910<-sum(lang0910$`total_09-10`)
fractionlang0910<- (english0910/totallang0910)*100
english1314<-lang1314[15,2]
totallang1314<-sum(lang1314$`total_13-14`)
fractionlang1314<- (english1314/totallang1314)*100
#PART 2
library(Hmisc)
#criminality
criminality <- read_csv("//hume/student-u02/marthoma/pc/Desktop/crimesbyreportingcountry08-
14.csv")
glimpse(crimes)
crimes<-criminality %>%
select(-c(`Flag and Footnotes`,LEG_STAT)) %>%
filter(CITIZEN=="Total",UNIT=="Number")
crimes$Value[crimes$Value==":"]<-NA
crimes<-crimes %>%
select(-c(CITIZEN,UNIT)) %>%
rename(COUNTRY=GEO) %>%
rename(CRIMINALS=Value) %>%
rename(DATE=TIME)
crimes$COUNTRY[crimes$COUNTRY=="England and Wales"]<-"United Kingdom"
crimes$COUNTRY[crimes$COUNTRY=="Scotland"]<-"United Kingdom"
crimes$COUNTRY[crimes$COUNTRY=="Northern Ireland (UK)"]<-"United Kingdom"
crimes$COUNTRY[crimes$COUNTRY=="Germany (until 1990 former territory of the FRG)"]<-
"Germany"
crimes<- crimes%>%
filter(!is.na(CRIMINALS))
rm(criminality)
#fixing the variables
crimes$DATE<-as.character(crimes$DATE)
class(crimes$CRIMINALS)
crimes$CRIMINALS<-gsub(".","",crimes$CRIMINALS)
crimes<-arrange(crimes,desc(CRIMINALS))
TOTAL$COUNTRY[TOTAL$COUNTRY=="GR"]<- "Greece"
TOTAL$COUNTRY[TOTAL$COUNTRY=="AT"]<- "Austria"
TOTAL$COUNTRY[TOTAL$COUNTRY=="BE"]<- "Belgium"
TOTAL$COUNTRY[TOTAL$COUNTRY=="BG"]<- "Burlgaria"
TOTAL$COUNTRY[TOTAL$COUNTRY=="CZ"]<- "Czechia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="CH"]<- "Switzerland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="CY"]<- "Cyprus"
TOTAL$COUNTRY[TOTAL$COUNTRY=="DE"]<- "Germany"
TOTAL$COUNTRY[TOTAL$COUNTRY=="DK"]<- "Denmark"
TOTAL$COUNTRY[TOTAL$COUNTRY=="EE"]<- "Estonia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="ES"]<- "Spain"
TOTAL$COUNTRY[TOTAL$COUNTRY=="FI"]<- "Finland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="FR"]<- "France"
TOTAL$COUNTRY[TOTAL$COUNTRY=="GB"]<- "United Kingdom"
TOTAL$COUNTRY[TOTAL$COUNTRY=="HR"]<- "Croatia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="HU"]<- "Hungary"
TOTAL$COUNTRY[TOTAL$COUNTRY=="IE"]<- "Ireland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="IS"]<- "Iceland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="IT"]<- "Italy"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LT"]<- "Lithuania"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LI"]<- "Liechtenstein"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LU"]<- "Luxembourg"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LV"]<- "Latvia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="MT"]<- "Malta"
TOTAL$COUNTRY[TOTAL$COUNTRY=="NL"]<- "Netherlands"
TOTAL$COUNTRY[TOTAL$COUNTRY=="NO"]<- "Norway"
TOTAL$COUNTRY[TOTAL$COUNTRY=="PL"]<- "Poland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="PT"]<- "Portugal"
TOTAL$COUNTRY[TOTAL$COUNTRY=="RO"]<- "Romania"
TOTAL$COUNTRY[TOTAL$COUNTRY=="SE"]<- "Sweden"
TOTAL$COUNTRY[TOTAL$COUNTRY=="SI"]<- "Slovenia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="SK"]<- "Slovakia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="TR"]<- "Turkey"
TOTAL$COUNTRY[TOTAL$COUNTRY=="UK"]<- "United Kingdom"
TOTAL$DATE<-as.character(TOTAL$DATE)
glimpse(TOTAL)
glimpse(crimes)
st_cri<-right_join(crimes,TOTAL, by=c("COUNTRY","DATE"),copy=FALSE)
glimpse(st_cri)
#PLOT TO SEE ANY POTENCIAL CONNECTION
ggplot(st_cri,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_point()
ggplot(st_cri,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_text()
#focus on top 5
st_cri_top5<- st_cri%>%
filter(COUNTRY %in% c("Spain","Germany","Italy","France","United Kingdom"))
ggplot(st_cri_going,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_text()
#average of values
glimpse(st_cri)
st_cri$CRIMINALS<-as.numeric(st_cri$CRIMINALS)
avgcrimes<- st_cri%>%
filter(!is.na(CRIMINALS))%>%
group_by(COUNTRY) %>%
summarise(avg.criminals=mean(CRIMINALS))
#avgcrimes$avg.criminals<-format(round(avgcrimes$avg.criminals,2),nsmall = 2)
class(avgcrimes$avg.criminals)
avgcrimes_top<- avgcrimes%>%
filter(avg.criminals>180000)
ggplot(avgcrimes_top,aes(x=COUNTRY,y=avg.criminals))+geom_col()+theme_classic() + labs(title
="Graph 3",x="Countries (top 14)",y="Average number of suspected")
avgst<-st_cri%>%
group_by(COUNTRY) %>%
summarise(avg.students=mean(GOING.TO))
class(avgst$avg.students)
avgst_crimes$avg.students<-format(round(avgst$avg.students,2),nsmall = 2)
avg_cri.st<- inner_join(avgcrimes,avgst,by="COUNTRY")
#scatter plot
theme_set(theme_bw())
ggplot(avg_cri.st,aes(x=avg.criminals,y=avg.students,label=COUNTRY)) +geom_text()
+labs(title="Figure 1", x="Average number of suspected people",y="Average number of receiving
students") + theme_classic()
#add limits: + xlim(c(0, 0.1)) + ylim(c(0, 500000)), line: + geom_smooth(method="loess", se=F)
#ols: geom_smooth(method="lm", se=FALSE) +
#geom_smooth(method="lm", se=FALSE) +
#theme_bw()
#correlations
#spain
st_spain<-TOTAL%>%
filter(COUNTRY=="Spain") %>%
select(-c(LEAVING.FROM))
cri_spain<-crimes%>%
filter(COUNTRY=="Spain")
cri_spain$CRIMINALS<-as.numeric(cri_spain$CRIMINALS)
class(cri_spain$CRIMINALS)
rcorr(st_spain$GOING.TO,cri_spain$CRIMINALS,type="pearson")
#france
st_fr<-TOTAL%>%
filter(COUNTRY=="France") %>%
select(-c(LEAVING.FROM))
cri_france<-crimes%>%
filter(COUNTRY=="France")
cri_france$CRIMINALS<-as.numeric(cri_france$CRIMINALS)
class(cri_france$CRIMINALS)
rcorr(st_fr$GOING.TO,cri_france$CRIMINALS,type="pearson")
#germany
st_de<-TOTAL%>%
filter(COUNTRY=="Germany") %>%
select(-c(LEAVING.FROM))
cri_de<-crimes%>%
filter(COUNTRY=="Germany")
cri_de$CRIMINALS<-as.numeric(cri_de$CRIMINALS)
class(cri_de$CRIMINALS)
rcorr(st_de$GOING.TO,cri_de$CRIMINALS,type="pearson")
#italy
st_it<-TOTAL%>%
filter(COUNTRY=="Italy") %>%
select(-c(LEAVING.FROM))
cri_it<-crimes%>%
filter(COUNTRY=="Italy")
cri_it$CRIMINALS<-as.numeric(cri_it$CRIMINALS)
class(cri_it$CRIMINALS)
rcorr(st_it$GOING.TO,cri_it$CRIMINALS,type="pearson")
#very weak correlation for all the countries
#continue to the next file: education
#07-11
eduexp07_11 <- read_csv("m:/pc/desktop/eduexp07-11.csv")
eduexp0711<- eduexp07_11%>%
select(-c(`Flag and Footnotes`,UNIT))%>%
filter(INDIC_ED=="Total public expenditure on education as % of GDP, at tertiary level of education
(ISCED 5-6)") %>%
filter(TIME !="2007")
eduexp0711$GEO[eduexp0711$GEO=="Germany (until 1990 former territory of the FRG)"]<-
"Germany"
rm(eduexp07_11)
eduexp0711$Value[eduexp0711$Value==":"]<-NA
eduexp0711<-eduexp0711%>%
select(-c(INDIC_ED)) %>%
filter(!is.na(Value))
glimpse(eduexp0711)
#changing to numeric
class(eduexp0711$Value)
eduexp0711$Value<-gsub(",","",eduexp0711$Value)
eduexp0711$Value<-as.numeric(eduexp0711$Value)
eduexp0711$Value<-(eduexp0711$Value)/100
class(eduexp0711$Value)
eduexp0711$Value<-as.numeric(eduexp0711$Value)
#12-14
educ_uoe_fine06_1_Data <- read_csv("M:/pc/Desktop/educ_uoe_fine06_1_Data.csv",
col_types = cols(TIME = col_character()))
eduexp1214<- educ_uoe_fine06_1_Data%>%
select(-c(`Flag and Footnotes`,UNIT,ISCED11)) %>%
filter(GEO !="European Union - 28 countries")
eduexp1214$GEO[eduexp1214$GEO=="Germany (until 1990 former territory of the FRG)"]<-
"Germany"
eduexp1214$Value[eduexp1214$Value==":"]<-NA
rm(educ_uoe_fine06_1_Data)
eduexp1214<-eduexp1214%>%
filter(!is.na(Value))
eduexp1214$Value<-as.numeric(eduexp1214$Value)
class(eduexp1214$Value)
glimpse(eduexp1214)
eduexp1214$TIME<-as.numeric(eduexp1214$TIME)
class(eduexp1214$TIME)
#merge
eduexp<-bind_rows(eduexp0711,eduexp1214)
glimpse(eduexp)
eduexp$DATE<-as.character(eduexp$DATE)
eduexp<-eduexp%>%
rename(DATE=TIME)%>%
rename(COUNTRY=GEO) %>%
rename(public.exp=Value)
#merge
st_exp<-inner_join(eduexp,TOTAL, by=c("COUNTRY","DATE"),copy=FALSE)
glimpse(st_exp)
st_exp<-st_exp%>%
select(-c(LEAVING.FROM)) %>%
filter(!is.na(public.exp))
#find avg
avgexp<-st_exp%>%
group_by(COUNTRY)%>%
summarise(avgpublic.exp=mean(public.exp))
class(avgexp$avgpublic.exp)
avgexp$avgpublic.exp<-format(round(avgexp$avgpublic.exp,3),nsmall = 3)
glimpse(avgst)
#merge
avg_exp.st<- inner_join(avgexp,avgst,by="COUNTRY")
class(avg_exp.st$avgpublic.exp)
avg_exp.st$avgpublic.exp<-as.numeric(avg_exp.st$avgpublic.exp)
#plot
ggplot(avg_exp.st,aes(x=avgpublic.exp,y=avg.students,label=COUNTRY)) +geom_text()
+labs(title="Figure 2", x="Average percentage of public expenditure",y="Average number of
receiving students") + theme_classic()
#see time plot for the top 3, if betters or not
spain_exp<-st_exp%>%
filter(COUNTRY=="Spain")
ggplot(spain_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE) +labs(title="Figure 3: Spain 2008-2014", x="Percentage of public expenditure",y="Average
number of receiving students") + theme_classic()
#germany
de_exp<-st_exp%>%
filter(COUNTRY=="Germany")
ggplot(de_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 5: Germany 2008-2014", x="Percentage of public
expenditure",y="Average number of receiving students") + theme_classic()
#france
fr_exp<-st_exp%>%
filter(COUNTRY=="France")
ggplot(fr_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 4: France 2008-2014", x="Percentage of public
expenditure",y="Average number of receiving students") + theme_classic()
#italy
it_exp<-st_exp%>%
filter(COUNTRY=="Italy")
ggplot(it_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 6: Italy 2008-2014", x="Percentage of public expenditure",y="Average
number of receiving students") + theme_classic()
#uk
uk_exp<-st_exp%>%
filter(COUNTRY=="United Kingdom")
ggplot(uk_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 7: England 2008-2014", x="Percentage of public
expenditure",y="Average number of receiving students") + theme_classic()

More Related Content

Similar to University Student’s mobility under the ERASMUS program in European Union, 2008-2014.

English as a medium of instruction
English as a medium  of instructionEnglish as a medium  of instruction
English as a medium of instruction
M Wright
 
A Comparative Analysis of English as Foreign Language Education Programmes
A Comparative Analysis of English as Foreign Language Education Programmes A Comparative Analysis of English as Foreign Language Education Programmes
A Comparative Analysis of English as Foreign Language Education Programmes
inventionjournals
 
London-schooling-lessons-from-the-capital.pdf
London-schooling-lessons-from-the-capital.pdfLondon-schooling-lessons-from-the-capital.pdf
London-schooling-lessons-from-the-capital.pdf
RoBerTCreaTi
 
London-schooling-lessons-from-the-capital (1).pdf
London-schooling-lessons-from-the-capital (1).pdfLondon-schooling-lessons-from-the-capital (1).pdf
London-schooling-lessons-from-the-capital (1).pdf
RoBerTCreaTi
 
British council english_as_a_medium_of_instruction
British council english_as_a_medium_of_instructionBritish council english_as_a_medium_of_instruction
British council english_as_a_medium_of_instruction
Pepe Kazeres
 
2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...
2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...
2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...
Bénédicte Souy-Cour
 
Schleicher
SchleicherSchleicher
Schleicher
guestaa507f
 
Results of the research
Results of the researchResults of the research
Results of the research
Santa Cruz
 
10 - Population projections revised, part 1 (2015) (ENG)
10 - Population projections revised, part 1 (2015) (ENG)10 - Population projections revised, part 1 (2015) (ENG)
10 - Population projections revised, part 1 (2015) (ENG)
InstitutoBBVAdePensiones
 
9922615 2
9922615 29922615 2
9922615 2
Clement_Lo
 
Social Market Foundation Report: Staying the Course
Social Market Foundation Report: Staying the CourseSocial Market Foundation Report: Staying the Course
Social Market Foundation Report: Staying the Course
Hobsons
 
24
2424
24
Jen W
 
2013 Ipsos-Europ Assistance holiday barometer_synthesis
2013 Ipsos-Europ Assistance holiday barometer_synthesis2013 Ipsos-Europ Assistance holiday barometer_synthesis
2013 Ipsos-Europ Assistance holiday barometer_synthesis
Europ Assistance Group
 
Eden visced bacsich
Eden visced bacsichEden visced bacsich
Eden visced bacsich
Paul Bacsich
 
PEST ANALYSIS OF EDUCATIONAL ACADEMY IN UK
PEST ANALYSIS OF EDUCATIONAL ACADEMY IN UKPEST ANALYSIS OF EDUCATIONAL ACADEMY IN UK
PEST ANALYSIS OF EDUCATIONAL ACADEMY IN UK
sambit mukherjee
 
Draft Report - Torino Process - Eastern Partnership and Russia
Draft Report - Torino Process - Eastern Partnership and RussiaDraft Report - Torino Process - Eastern Partnership and Russia
Draft Report - Torino Process - Eastern Partnership and Russia
ETF - European Training Foundation
 
Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...
Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...
Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...
Driessen Research
 
Transforming population and migration statistics: Emigration patterns of non-...
Transforming population and migration statistics: Emigration patterns of non-...Transforming population and migration statistics: Emigration patterns of non-...
Transforming population and migration statistics: Emigration patterns of non-...
Office for National Statistics
 
Briefing Book: English to Speakers of Other Languages (ESOL) Programs in Boston
Briefing Book: English to Speakers of Other Languages (ESOL) Programs in BostonBriefing Book: English to Speakers of Other Languages (ESOL) Programs in Boston
Briefing Book: English to Speakers of Other Languages (ESOL) Programs in Boston
Instituto Diáspora Brasil (IDB)
 
Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...
Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...
Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...
Instituto Nacional de Evaluación Educativa
 

Similar to University Student’s mobility under the ERASMUS program in European Union, 2008-2014. (20)

English as a medium of instruction
English as a medium  of instructionEnglish as a medium  of instruction
English as a medium of instruction
 
A Comparative Analysis of English as Foreign Language Education Programmes
A Comparative Analysis of English as Foreign Language Education Programmes A Comparative Analysis of English as Foreign Language Education Programmes
A Comparative Analysis of English as Foreign Language Education Programmes
 
London-schooling-lessons-from-the-capital.pdf
London-schooling-lessons-from-the-capital.pdfLondon-schooling-lessons-from-the-capital.pdf
London-schooling-lessons-from-the-capital.pdf
 
London-schooling-lessons-from-the-capital (1).pdf
London-schooling-lessons-from-the-capital (1).pdfLondon-schooling-lessons-from-the-capital (1).pdf
London-schooling-lessons-from-the-capital (1).pdf
 
British council english_as_a_medium_of_instruction
British council english_as_a_medium_of_instructionBritish council english_as_a_medium_of_instruction
British council english_as_a_medium_of_instruction
 
2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...
2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...
2016_02 The evolution of immigration and asylum policy in Luxembourg - insigh...
 
Schleicher
SchleicherSchleicher
Schleicher
 
Results of the research
Results of the researchResults of the research
Results of the research
 
10 - Population projections revised, part 1 (2015) (ENG)
10 - Population projections revised, part 1 (2015) (ENG)10 - Population projections revised, part 1 (2015) (ENG)
10 - Population projections revised, part 1 (2015) (ENG)
 
9922615 2
9922615 29922615 2
9922615 2
 
Social Market Foundation Report: Staying the Course
Social Market Foundation Report: Staying the CourseSocial Market Foundation Report: Staying the Course
Social Market Foundation Report: Staying the Course
 
24
2424
24
 
2013 Ipsos-Europ Assistance holiday barometer_synthesis
2013 Ipsos-Europ Assistance holiday barometer_synthesis2013 Ipsos-Europ Assistance holiday barometer_synthesis
2013 Ipsos-Europ Assistance holiday barometer_synthesis
 
Eden visced bacsich
Eden visced bacsichEden visced bacsich
Eden visced bacsich
 
PEST ANALYSIS OF EDUCATIONAL ACADEMY IN UK
PEST ANALYSIS OF EDUCATIONAL ACADEMY IN UKPEST ANALYSIS OF EDUCATIONAL ACADEMY IN UK
PEST ANALYSIS OF EDUCATIONAL ACADEMY IN UK
 
Draft Report - Torino Process - Eastern Partnership and Russia
Draft Report - Torino Process - Eastern Partnership and RussiaDraft Report - Torino Process - Eastern Partnership and Russia
Draft Report - Torino Process - Eastern Partnership and Russia
 
Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...
Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...
Geert Driessen (2019 Encyclopedia Grade retention, grade repetition, holding ...
 
Transforming population and migration statistics: Emigration patterns of non-...
Transforming population and migration statistics: Emigration patterns of non-...Transforming population and migration statistics: Emigration patterns of non-...
Transforming population and migration statistics: Emigration patterns of non-...
 
Briefing Book: English to Speakers of Other Languages (ESOL) Programs in Boston
Briefing Book: English to Speakers of Other Languages (ESOL) Programs in BostonBriefing Book: English to Speakers of Other Languages (ESOL) Programs in Boston
Briefing Book: English to Speakers of Other Languages (ESOL) Programs in Boston
 
Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...
Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...
Educar en el s.XXI. UIMP 2013. The Wert plan: evidence from the introduction ...
 

Recently uploaded

一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 

Recently uploaded (20)

一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 

University Student’s mobility under the ERASMUS program in European Union, 2008-2014.

  • 1. University Student’s mobility under the ERASMUS program in European Union, 2008-2014. Candidate number: 171846 Introduction _______________________________________________________________2 Part 1_____________________________________________________________________3 Part 2_____________________________________________________________________5 Conclusion_________________________________________________________________7 Appendix__________________________________________________________________8 Code ____________________________________________________________________10
  • 2. Introduction The ERASMUS program (European Community Action Scheme for the Mobility of University Students) is a student exchange program among the universities which are located in EU member states. It operates since 1987 and its primary aim is to promote the collaboration as well as to lessen the cultural differences by experiencing the life abroad. Even though that there are other programs similar to ERASMUS that operate in EU, such as ERASMUS+, we will focus our analysis to university students regardless their level of studies. Our analysis consists of two parts. In the first place, we are going to aggregate the various entries to find insights about the student’s mobility characteristics. For that purpose, we are using the data files on EU Open Data Portal (data.europa.eu), which vary from the second semester of 2008 until the first semester of 2014. In these files, we can find entries of students who chose to participate on the program the above period and also, more specific characteristics about them, such as their age, their level of studies etc. The second part focuses on the countries and tries to investigate the possible effect on mobility by the countries’ specific characteristics. In combination with our aggregate data from the first part, we use data files, which were retrieved from the website of Eurostat (ec.europa.eu/eurostat/data/database).
  • 3. Part 1  Countries. Between the periods of our interest, approximately 2 million students decided to participate in ERASMUS program. The top five countries that attracted the students' interest are Spain, France, Germany, United Kingdom and Italy. These countries were chosen by the 55.5% of all the students who chose to study abroad. The latter means that 1 out of 2 students preferred a country among the top five destinations, where Spain reaches the first position. On the other side, the students which chose to leave their country1 have almost the same preferences in regard with the host countries. However, instead of Italy, in the fifth place is Poland and Italy follows closely on sixth. In addition, we observed the same pattern in preferences, meaning that the 59.5% of the total number of students who left their countries belongs to the top five countries.
  • 4.  Gender, age and study level. According to our data, the females prefer to participate more in the exchange program than the men. That is to say, the percentage of the total participants who are women varies around 60% each year2 . In addition, the most students chose to participate when they were at the age of 21; number which changes only for the period 2008-2009 (23 years old). Moreover, the majority of participants studied at the first cycle of their studies when they chose to participate in the program.  Subject area and taught language. The educational background of the majority of the participants belongs to the study areas of business and administration, humanities and foreign languages. Notably, the latter gathers a vast number of students and are significant larger than the other categories. As a result, we can conclude that a large fraction of the overall students chose to participate aiming to a better understanding of a foreign language; as a supplement to their studies. This conclusion is particular interesting as we can see that the students chose the program as a mean to learn a language better and thus, a new culture. The latter, also, is one of the main reasons that the ERASMUS program has been established among the European states. Furthermore, we can see, unsurprisingly that the main language which dominates among the taught courses is English. In addition, the native languages from the top preferable countries complete the top taught languages by the university courses. However, the picture gets interesting when we look throughout the years. The percentage of English language accounts for 45,4% in 2008 and reaches 58% by the end of the first semester of 2014. In particular, the latter result quite contradicts our above observations. Firstly, the majority of ongoing university studies belong to foreign language area, a fraction that becomes bigger as the years pass. At the same time, however, the percentage of courses being taught in English also rises, in a great degree. Secondly, we can see that the top non-English countries have a steady slightly increasing number of students throughout the years. Although, we can find their native languages on the top list, the percentage of English taught courses remains large. Given these points, one can conclude that the success of ERASMUS program (attracting a great number of students every year), gave an incentive to the universities in non-English countries to add English taught courses.
  • 5. Part 2 In this part, we will try to examine the reasons under the participants' preferences regarding the countries which they chose to travel and study. Henceforth, we will compare the total number of receiving students for each country with three different facts of the country; its criminal rates, its governments' expenditure in tertiary education and its……….  Criminality Our data set contains the number of total suspected people who were arrested in one of the European states, regardless their nationality. While we aggregate the data, we can see that the biggest number of total suspected people belongs to Germany (Graph 3 in Appendix). Meanwhile, France and Italy also, belong to the first five positions. As it is known, the criminal rates are considered to be a negative factor for the country's popularity towards the visitors. Together with our findings from the first part, it seems strange to see three of the most popular countries for students, also having high criminality. In addition, when we plot the average number of students against the average number of criminals, we observed a slightly positive relationship Despite the latter surprising result, we plot the average number of students against the average number of criminals (see Figure 1 in Appendix). Given the scatter plot, we cannot establish any relationship between the variables. Moreover, when we see the rates of criminality per hundred thousand inhabitants, we see slightly different picture. Except Finland, the rates are almost the same among the most popular destinations for the students. As well as, the correlation of incoming students and number of criminals for these countries is very low (lower than 0.5). Under these two conclusions, we can state that the students were not concerned about the criminal rates when they were choosing for an ERASMUS destination. In other words, the criminality of a country did not affect the students' choice to study there.  Public expenditure in tertiary education For this section, our data represent each country's public expenditure in tertiary education as a percentage of its gross domestic product (GDP). To begin with, we observe that the Scandinavian countries spend the biggest percentage in education. At the same time,
  • 6. these countries have least or the same popularity with countries which spend less or much less. Overall, looking the scatter plot (Figure 2 ), we are unable to establish any correlation with the percentage of GDP spent in education and the popularity as a ERASMUS destination. Furthermore, we can look closely into the first five popular destinations. Firstly, we can exclude the years 2008 and 2014 as we have data for only one semester. Secondly, we can observe a decline in expenditure on the year 2012 for all countries expect England. Markedly, it reflects the economic depression that the governments' decisions against it. The main result from the scatter plots and the linear regressions (see Figure 3,4,5,6,7 in Appendix) is that of a positive relationship. To be more specific, we can establish that the public expenditure in education can positively affect the number of ERASMUS students in a country. However, the relationship is slightly positive and we can some occasions that there is not a positive relationship.
  • 7. Conclusion In a final analysis, we saw that the students' mobility has being characterized by some specific patterns throughout the time frame of 2008-2014. For instance, the participants' preferences vary very little between the years regarding both the home and the host countries. Notably, there are five countries that dominate in numbers in both categories and hold more than the 50% of the total number of participants. These countries are Spain, France, Germany, England and Italy (or Poland). Moreover, we observed that the majority of participants studied towards a degree concerning foreign languages. Hence, we can say that the program is a tool for those students to better their studies. In the same time, we saw that the percentage of English taught courses increased rapidly from 2008. Additionally, we tried to explain if criminality or public expenditure in tertiary education is a factor that affects the participants to choose a country as their study destination. In general, we would have expected that the first will have a negative effect, while the latter a positive. However, we found strong evidences that the criminality is not a factor that the students are taking into consideration when they choose their destination. On the other hand, we found a slightly positive relationship between the number of incoming students and the percentage of public expenditure in education. Nonetheless, high expenditure in education does not mean large number of students as we saw in the case of Scandinavian countries. As can be seen, our analysis concerns a small time frame where is difficult to draw strong conclusions about the ERASMUS mobility. Likewise, a further more analytical analysis for the students' mobility needs to take place in order to justify our findings in that paper.
  • 9.
  • 10. Code #extracting the students mobility by year library(tidyverse) library(dplyr) library(readr) library(lubridate) library(ggplot2) stdata0809 <- read_delim("M:/pc/Desktop/student_data_2008 (1).csv", ";", escape_double = FALSE, trim_ws = TRUE) glimpse(stdata0809) data0809<-stdata0809 %>% select(- c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,LENGTHWORKPLACEMENT,TYPEWORKSECTO R,CONSORTIUMAGREEMENTNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDIT SWORK,WORKPLACEMENT,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTIO N,HOSTINSTITUTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARA TION,TAUGHTHOSTLANG)) %>% filter(MOBILITYTYPE != "P") %>% filter(!is.na(STUDYSTARTDATE)) %>% separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR")) glimpse(data0809) rm(stdata0809) sum(is.na(data0809$YEAR)) sum(data0809$COUNTRYOFHOSTINSTITUTION==0) data0809<- data0809 %>%
  • 11. filter(COUNTRYOFHOSTINSTITUTION !=0) %>% filter(!is.na(YEAR)) %>% select(-c(MONTH)) #separate to 2008 and 2009 #2008 data08<-data0809 %>% filter(YEAR=="2008") #2009 data09<-data0809 %>% filter(YEAR=="2009") #summary,host #host08<-data08 %>% #count(COUNTRYOFHOSTINSTITUTION,sort = TRUE, name="GOING.TO") host08<-data08 %>% count(COUNTRYOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host08) names(host08)[names(host08)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY" names(host08)[names(host08)=="n"]<-"GOING.TO" #summary,home #home08<- data08 %>% #count(COUNTRYOFHOMEINSTITUTION,sort = TRUE, name="LEAVING.FROM" ) home08<- data08 %>% count(COUNTRYOFHOMEINSTITUTION,sort = TRUE)
  • 12. #rename to COUNTRY names(home08) names(home08)[names(home08)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY" names(home08)[names(home08)=="n"]<-"LEAVING.FROM" #dif no of obs anti_join(host08,home08,by="COUNTRY") home08<- home08 %>% add_row(COUNTRY="FR",LEAVING.FROM=0) %>% add_row(COUNTRY="TR",LEAVING.FROM=0) %>% add_row(COUNTRY="CH",LEAVING.FROM=0) anti_join(home8,host08,by="COUNTRY") #TOTAL 2008 s2 #ADD YEAR COLUMN + row for HR #PLOT TO SEE HOW IT LOOKS #ggplot(to08,aes(x=YEAR,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text() total08<-inner_join(host08,home08,by="COUNTRY") dt08<-as.Date("2008-08-01") library(zoo)
  • 13. total08<-total08 %>% add_row(COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0) %>% add_column(DATE=dt08,.before = "COUNTRY") Date2period <- function(dt08, period = 6, sep = " S") { ym<- as.yearmon(dt08) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } #general function #Date2period <- function(x, period = 6, sep = " S") { #ym<- as.yearmon(x) # paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) #} total08$DATE <- Date2period(total08$DATE) #the the second part TOTAL08<-left_join(host08,home08,by="COUNTRY") TOTAL08<-TOTAL08 %>% add_row(COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0) %>% add_row(COUNTRY = "CH", LEAVING.FROM = 0,GOING.TO=0) #2009a #summary,host host09<-data09 %>%
  • 14. count(COUNTRYOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host09) names(host09)[names(host09)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY" names(host09)[names(host09)=="n"]<-"GOING.TO" #summary,home home09<- data09 %>% count(COUNTRYOFHOMEINSTITUTION,sort = TRUE) #rename to COUNTRY names(home09) names(home09)[names(home09)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY" names(home09)[names(home09)=="n"]<-"LEAVING.FROM" anti_join(host09,home09,by="COUNTRY") home09<- home09 %>% add_row(COUNTRY="FR",LEAVING.FROM=0) %>% add_row(COUNTRY="TR",LEAVING.FROM=0) #TOTAL 2009a #ADD YEAR COLUMN #MERGE TOTAL09A<-inner_join(home09,host09,by="COUNTRY") TOTAL09A<-add_row(TOTAL09A,COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0) dt09a<-as.Date("2009-02-01")
  • 15. total09a<- TOTAL09A %>% add_column(DATE=dt09a,.before = "COUNTRY") Date2period <- function(dt09a, period = 6, sep = " S") { ym<- as.yearmon(dt09a) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total09a$DATE <- Date2period(total09a$DATE) #test how ggplot looks #total<-bind_rows(total08,total09a) #ggplot(total,aes(x=DATE,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text() #ggplot(total09,aes(x=DATE,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text() #keep it clean rm(data08,data09,home08,home09,host09,host08) #IMPORTING THE NEXT FILE 2009-2010
  • 16. stdata0910 <- read_delim("M:/pc/Desktop/student_data_2009.csv", ";", escape_double = FALSE, trim_ws = TRUE) glimpse(stdata0910) data0910<-stdata0910 %>% select(- c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,LENGTHWORKPLACEMENT,TYPEWORKSECTO R,CONSORTIUMAGREEMENTNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDIT SWORK,WORKPLACEMENT,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTIO N,HOSTINSTITUTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARA TION,TAUGHTHOSTLANG)) %>% filter(MOBILITYTYPE != "P") %>% filter(!is.na(STUDYSTARTDATE)) %>% separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR")) glimpse(data0910) rm(stdata0910) sum(is.na(data0910$YEAR)) sum(data0910$COUNTRYOFHOSTINSTITUTION==0) data0910<- data0910 %>% filter(COUNTRYOFHOSTINSTITUTION !=0) %>% filter(!is.na(YEAR)) %>% select(-c(MONTH)) #separate to 2010 and 2009 #2010 data10<-data0910 %>%
  • 17. filter(YEAR=="2010") #2009 data09<-data0910 %>% filter(YEAR=="2009") #summary,host host10<-data10 %>% count(COUNTRYOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host10) names(host10)[names(host10)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY" names(host10)[names(host10)=="n"]<-"GOING.TO" #summary,home home10<- data10 %>% count(COUNTRYOFHOMEINSTITUTION,sort = TRUE) #rename to COUNTRY names(home10) names(home10)[names(home10)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY" names(home10)[names(home10)=="n"]<-"LEAVING.FROM" anti_join(host10,home10,by="COUNTRY") sum(data0910$COUNTRYOFHOMEINSTITUTION=="SE") anti_join(home10,host10,by="COUNTRY") home10<- home10 %>% add_row(COUNTRY="SE",LEAVING.FROM=0) %>% add_row(COUNTRY="TR",LEAVING.FROM=0) %>%
  • 18. add_row(COUNTRY="EE",LEAVING.FROM=0) host10<- host10%>% add_row(COUNTRY="HR",GOING.TO=0) # s o s #We see that the HR country only send students and not receiving #home10$COUNTRY[!(home10$COUNTRY %in% host10$COUNTRY)] #TOTAL 2010 #ADD YEAR COLUMN #MERGE TOTAL10A<-inner_join(home10,host10,by="COUNTRY") dt10a<-as.Date("2010-02-01") total10a<- TOTAL10A %>% add_column(DATE=dt10a,.before = "COUNTRY") Date2period <- function(dt10a, period = 6, sep = " S") { ym<- as.yearmon(dt10a) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total10a$DATE <- Date2period(total10a$DATE) str(total10a) #2009 the same as before
  • 19. #summary,host host09<-data09 %>% count(COUNTRYOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host09) names(host09)[names(host09)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY" names(host09)[names(host09)=="n"]<-"GOING.TO" #summary,home home09<- data09 %>% count(COUNTRYOFHOMEINSTITUTION,sort = TRUE ) #rename to COUNTRY names(home09) names(home09)[names(home09)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY" names(home09)[names(home09)=="n"]<-"LEAVING.FROM" host09$COUNTRY[!(host09$COUNTRY %in% home09$COUNTRY)] home09$COUNTRY[!(home09$COUNTRY %in% host09$COUNTRY)] home09<- home09 %>% add_row(COUNTRY="SE",LEAVING.FROM=0) %>% add_row(COUNTRY="TR",LEAVING.FROM=0) %>% add_row(COUNTRY="EE",LEAVING.FROM=0) %>% add_row(COUNTRY="CH",LEAVING.FROM=0) host09<- host09%>% add_row(COUNTRY="HR",GOING.TO=0) %>% add_row(COUNTRY="cH",GOING.TO=0)
  • 20. #TOTAL 2009 #MERGE TOTAL09B<-left_join(home09,host09,by="COUNTRY") dt09b<-as.Date("2009-09-01") total09b<- TOTAL09B %>% add_column(DATE=dt09b,.before = "COUNTRY") Date2period <- function(dt09b, period = 6, sep = " S") { ym<- as.yearmon(dt09b) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total09b$DATE <- Date2period(total09b$DATE) #part 2 TOTAL09<- left_join(TOTAL09B,TOTAL09A, by="COUNTRY") TOTAL09[is.na(TOTAL09)]<-0 TOTAL09<- TOTAL09 %>% group_by(COUNTRY) %>% transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y, GOING.TO=GOING.TO.x+GOING.TO.y)
  • 21. #CLEAN rm(home10,host10,home09,host09,data09,data10) #import the next file 2010-2011 stdata1011 <- read_delim("M:/pc/Desktop/student_data_2010 (1).csv", ";", escape_double = FALSE, trim_ws = TRUE) glimpse(stdata1011) names(stdata1011) sum(is.na(stdata1011$STUDYSTARTDATE)) data1011<-stdata1011 %>% select(- c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,TYPEWORKSECTOR,CONSORTIUMAGREEMEN TNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDITSWORK,WORKPLACEMENT, ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTION,HOSTINSTITUTION,QUALI FICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARATION,TAUGHTHOSTLANG)) %>% filter(MOBILITYTYPE != "P") %>% filter(!is.na(STUDYSTARTDATE)) %>% separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR"))
  • 22. sum(is.na(data1011$YEAR)) sum(data1011$COUNTRYCODEOFHOSTINSTITUTION==0) #BEFR,BENL referrring to BE data1011$COUNTRYCODEOFHOSTINSTITUTION[data1011$COUNTRYCODEOFHOSTINSTITUTION=="B EFR"]<-"BE" data1011$COUNTRYCODEOFHOSTINSTITUTION[data1011$COUNTRYCODEOFHOSTINSTITUTION=="B ENL"]<-"BE" data1011$COUNTRYCODEOFHOMEINSTITUTION[data1011$COUNTRYCODEOFHOMEINSTITUTION== "BEFR"]<-"BE" data1011$COUNTRYCODEOFHOMEINSTITUTION[data1011$COUNTRYCODEOFHOMEINSTITUTION== "BENL"]<-"BE" glimpse(data1011) data1011<- data1011 %>% filter(!is.na(YEAR)) %>% select(-c(MONTH)) #separate to 2010 and 2011 #2010 data10<-data1011 %>% filter(YEAR=="2010") #2009 data11<-data1011 %>% filter(YEAR=="2011")
  • 23. #summary,host host10<-data10 %>% count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host10) names(host10)[names(host10)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY" names(host10)[names(host10)=="n"]<-"GOING.TO" #summary,home home10<- data10 %>% count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE) #rename to COUNTRY names(home10) names(home10)[names(home10)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY" names(home10)[names(home10)=="n"]<-"LEAVING.FROM" anti_join(host10,home10,by="COUNTRY") anti_join(home10,host10,by="COUNTRY") home10<- home10 %>% add_row(COUNTRY="EE",LEAVING.FROM=0) %>% add_row(COUNTRY="TR",LEAVING.FROM=0) %>% add_row(COUNTRY="MT",LEAVING.FROM=0)%>% add_row(COUNTRY="CH",LEAVING.FROM=0)
  • 24. #TOTAL 2010 #ADD YEAR COLUMN #MERGE TOTAL10B<-inner_join(home10,host10,by="COUNTRY") TOTAL10B$COUNTRY[!(TOTAL10B$COUNTRY %in% TOTAL10A$COUNTRY)] TOTAL10A<- TOTAL10A %>% add_row(COUNTRY="CH",GOING.TO=0,LEAVING.FROM=0) dt10b<-as.Date("2010-09-01") total10b<- TOTAL10B %>% add_column(DATE=dt10b,.before = "COUNTRY") Date2period <- function(dt10b, period = 6, sep = " S") { ym<- as.yearmon(dt10b) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total10b$DATE <- Date2period(total10b$DATE) #part 2 TOTAL10<- inner_join(TOTAL10A,TOTAL10B, by="COUNTRY") TOTAL10<- TOTAL10 %>% group_by(COUNTRY) %>% transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y,
  • 25. GOING.TO=GOING.TO.x+GOING.TO.y) #2011 #summary,host host11<-data11 %>% count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host11) names(host11)[names(host11)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY" names(host11)[names(host11)=="n"]<-"GOING.TO" #summary,home home11<- data11 %>% count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE ) #rename to COUNTRY names(home11) names(home11)[names(home11)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY" names(home11)[names(home11)=="n"]<-"LEAVING.FROM" host11$COUNTRY[!(host11$COUNTRY %in% home11$COUNTRY)] home11$COUNTRY[!(home11$COUNTRY %in% host11$COUNTRY)] home11<- home11 %>% add_row(COUNTRY="CH",LEAVING.FROM=0) %>% add_row(COUNTRY="TR",LEAVING.FROM=0) %>% add_row(COUNTRY="EE",LEAVING.FROM=0) %>%
  • 26. add_row(COUNTRY="MT",LEAVING.FROM=0) #TOTAL 2011 #MERGE TOTAL11A<-inner_join(home11,host11,by="COUNTRY") dt11a<-as.Date("2011-02-01") total11a<- TOTAL11A %>% add_column(DATE=dt11a,.before = "COUNTRY") Date2period <- function(dt11a, period = 6, sep = " S") { ym<- as.yearmon(dt11a) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total11a$DATE <- Date2period(total11a$DATE) #REMOVE rm(data11,home11,host11,stdata1011) rm(data10,home10,host10) #2011-2012
  • 27. stdata1112 <- read_delim("//hume/student-u02/marthoma/pc/Desktop/student_1112.csv", ";", escape_double = FALSE, trim_ws = TRUE) glimpse(stdata1112) names(stdata1112) sum(is.na(stdata1112$STUDYSTARTDATE)) data1112<-stdata1112 %>% select(- c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,PLACEMENTENTERPRISE,COUNTRYOFPLACEM ENT,ECTSCREDITSPLACEMENT,LENGTHPLACEMENT,TYPEPLACEMENTSECTOR,CONSORTIUMAGREEM ENTNUMBER,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTION,HOSTINSTIT UTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARATION,TAUGHT HOSTLANG,SNSUPPLEMENT)) %>% filter(MOBILITYTYPE != "P") %>% filter(!is.na(STUDYSTARTDATE)) %>% separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR")) sum(is.na(data1112$YEAR)) sum(data1112$COUNTRYCODEOFHOSTINSTITUTION==0) #BEFR,BENL referrring to BE data1112$COUNTRYCODEOFHOSTINSTITUTION[data1112$COUNTRYCODEOFHOSTINSTITUTION=="B EFR"]<-"BE" data1112$COUNTRYCODEOFHOSTINSTITUTION[data1112$COUNTRYCODEOFHOSTINSTITUTION=="B ENL"]<-"BE" data1112$COUNTRYCODEOFHOSTINSTITUTION[data1112$COUNTRYCODEOFHOSTINSTITUTION=="B EDE"]<-"BE"
  • 28. data1112$COUNTRYCODEOFHOMEINSTITUTION[data1112$COUNTRYCODEOFHOMEINSTITUTION== "BEFR"]<-"BE" data1112$COUNTRYCODEOFHOMEINSTITUTION[data1112$COUNTRYCODEOFHOMEINSTITUTION== "BENL"]<-"BE" glimpse(data1112) data1112<- data1112 %>% select(-c(MONTH)) #separate to 2011 and 2012 #2012 data12<-data1112 %>% filter(YEAR=="12") #2011 data11<-data1112 %>% filter(YEAR=="11") #summary,host host12<-data12 %>% count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host12) names(host12)[names(host12)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY" names(host12)[names(host12)=="n"]<-"GOING.TO"
  • 29. #summary,home home12<- data12 %>% count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE) #rename to COUNTRY names(home12) names(home12)[names(home12)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY" names(home12)[names(home12)=="n"]<-"LEAVING.FROM" anti_join(host12,home12,by="COUNTRY") #TOTAL 2012A #MERGE TOTAL12A<-inner_join(home12,host12,by="COUNTRY") dt12a<-as.Date("2012-02-01") total12a<- TOTAL12A %>% add_column(DATE=dt12a,.before = "COUNTRY") Date2period <- function(dt12a, period = 6, sep = " S") { ym<- as.yearmon(dt12a) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total12a$DATE <- Date2period(total12a$DATE)
  • 30. #2011B #summary,host host11<-data11 %>% count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE) #rename to COUNTRY names(host11) names(host11)[names(host11)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY" names(host11)[names(host11)=="n"]<-"GOING.TO" #summary,home home11<- data11 %>% count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE ) #rename to COUNTRY names(home11) names(home11)[names(home11)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY" names(home11)[names(home11)=="n"]<-"LEAVING.FROM" #TOTAL 2011 #MERGE TOTAL11B<-inner_join(home11,host11,by="COUNTRY") dt11b<-as.Date("2011-09-01")
  • 31. total11b<- TOTAL11B %>% add_column(DATE=dt11b,.before = "COUNTRY") Date2period <- function(dt11b, period = 6, sep = " S") { ym<- as.yearmon(dt11b) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total11b$DATE <- Date2period(total11b$DATE) #part2 TOTAL11<- inner_join(TOTAL11A,TOTAL11B, by="COUNTRY") TOTAL11<- TOTAL11 %>% group_by(COUNTRY) %>% transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y, GOING.TO=GOING.TO.x+GOING.TO.y) #REMOVE rm(data12,data11,home11,host11,home12,host12,stdata1112) #2012-2013 stdata1213 <- read_delim("M:/pc/Desktop/SM_2012_13_20141103_01 (2).csv", ";", escape_double = FALSE, trim_ws = TRUE)
  • 32. glimpse(stdata1213) names(stdata1213) sum(is.na(stdata1213$STUDY_START_DATE)) data1213<-stdata1213 %>% select(- c(STUDENT_ID,TYPE_PLACEMENT_SECTOR_VALUE,ECTS_CREDITS_PLACEMENT_AMT,NUMB_YRS_HI GHER_EDUCAT_VALUE,PLACEMENT_ENTERPRISE_VALUE,TOTAL_ECTS_CREDITS_AMT, QUALIFICATION_AT_HOST_CDE,STUDENT_NATIONALITY_CDE,ID_MOBILITY_CDE,PLACEMENT_ENTE RPRISE_CTRY_CDE,PLACEMENT_ENTERPRISE_CTRY_CDE,LENGTH_PLACEMENT_VALUE, CONSORTIUM_AGREEMENT_NUMBER,SPECIAL_NEEDS_SUPPLEMENT_VALUE,STUDY_GRANT_AMT, HOST_INSTITUTION_CDE,HOME_INSTITUTION_CDE,PLACEMENT_ENTERPRISE_SIZE_CDE, SHORT_DURATION_CDE,ECTS_CREDITS_STUDY_AMT,PLACEMENT_GRANT_AMT)) %>% filter(MOBILITY_TYPE_CDE != "P") %>% filter(!is.na(STUDY_START_DATE)) %>% separate(STUDY_START_DATE,sep="-",into=c("MONTH","YEAR")) sum(is.na(data1213$YEAR)) sum(data1213$HOST_INSTITUTION_COUNTRY_CDE==0) #BEFR,BENL referrring to BE data1213$HOST_INSTITUTION_COUNTRY_CDE[data1213$HOST_INSTITUTION_COUNTRY_CDE=="BE FR"]<-"BE" data1213$HOST_INSTITUTION_COUNTRY_CDE[data1213$HOST_INSTITUTION_COUNTRY_CDE=="BE NL"]<-"BE" data1213$HOST_INSTITUTION_COUNTRY_CDE[data1213$HOST_INSTITUTION_COUNTRY_CDE=="BE DE"]<-"BE"
  • 33. data1213$HOME_INSTITUTION_CTRY_CDE[data1213$HOME_INSTITUTION_CTRY_CDE=="BEFR"]<- "BE" data1213$HOME_INSTITUTION_CTRY_CDE[data1213$HOME_INSTITUTION_CTRY_CDE=="BENL"]<- "BE" data1213<- data1213 %>% filter(!is.na(YEAR)) %>% select(-c(MONTH)) #separate to 2013 and 2012 #2012 data12<-data1213 %>% filter(YEAR=="12") #2013 data13<-data1213 %>% filter(YEAR=="13") #summary host 2013 host13<-data13 %>% count(HOST_INSTITUTION_COUNTRY_CDE,sort = TRUE) #rename to COUNTRY names(host13) names(host13)[names(host13)=="HOST_INSTITUTION_COUNTRY_CDE"]<-"COUNTRY" names(host13)[names(host13)=="n"]<-"GOING.TO" #summary,home home13<- data13 %>%
  • 34. count(HOME_INSTITUTION_CTRY_CDE,sort = TRUE) #rename to COUNTRY names(home13) names(home13)[names(home13)=="HOME_INSTITUTION_CTRY_CDE"]<-"COUNTRY" names(home13)[names(home13)=="n"]<-"LEAVING.FROM" #2013A #MERGE TOTAL13A<-inner_join(home13,host13,by="COUNTRY") dt13a<-as.Date("2013-02-01") total13a<- TOTAL13A %>% add_column(DATE=dt13a,.before = "COUNTRY") Date2period <- function(dt13a, period = 6, sep = " S") { ym<- as.yearmon(dt13a) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total13a$DATE <- Date2period(total13a$DATE) #summary,host host12<-data12 %>% count(HOST_INSTITUTION_COUNTRY_CDE,sort = TRUE) #rename to COUNTRY
  • 35. names(host12) names(host12)[names(host12)=="HOST_INSTITUTION_COUNTRY_CDE"]<-"COUNTRY" names(host12)[names(host12)=="n"]<-"GOING.TO" #summary,home home12<- data12 %>% count(HOME_INSTITUTION_CTRY_CDE,sort = TRUE ) #rename to COUNTRY names(home12) names(home12)[names(home12)=="HOME_INSTITUTION_CTRY_CDE"]<-"COUNTRY" names(home12)[names(home12)=="n"]<-"LEAVING.FROM" #TOTAL 2012 #MERGE TOTAL12B<-inner_join(home12,host12,by="COUNTRY") dt12b<-as.Date("2012-09-01") total12b<- TOTAL12B %>% add_column(DATE=dt12b,.before = "COUNTRY") Date2period <- function(dt12b, period = 6, sep = " S") { ym<- as.yearmon(dt12b) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total12b$DATE <- Date2period(total12b$DATE)
  • 36. #part2 TOTAL12<- inner_join(TOTAL12A,TOTAL12B, by="COUNTRY") TOTAL12<- TOTAL12 %>% group_by(COUNTRY) %>% transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y, GOING.TO=GOING.TO.x+GOING.TO.y) #clean rm(data13,data12,stdata1213,home12,home13,host12,host13) #2013-2014 library(readxl) stdata1314<-read_excel("M:/pc/Desktop/Student_Mobility_2013-14.xlsx") #changing the date character into date format stdata1314$StartDate<-dmy_hms(stdata1314$StartDate,tz=Sys.timezone()) glimpse(stdata1314) names(stdata1314) sum(is.na(stdata1314$StartDate)) sum(stdata1314$ReceivingCountry=="BEFR")
  • 37. sum(stdata1314$ReceivingCountry=="BENL") sum(stdata1314$ReceivingCountry=="BEDE") sum(stdata1314$SendingCountry=="BEFR") data1314<- stdata1314 %>% filter(CombinedMobilityYesNo=="NO") %>% filter(MobilityType == "Mob-SMS") %>% separate(StartDate,sep="-",into=c("year","month","day")) %>% select(- c(Action,CombinedMobilityYesNo,ProjectNumber,SpecialNeeds,EndDate,ParticipantID,SendingPartn erName, MobiilityID,DurationInMonths,HostingPartnerErasmusID,DurationInDays,ParticipantType,HostingPar tnerName, SubsistenceTravel,CallYear,SendingPartnerErasmusID,HostingPartnerCity)) data1314$ReceivingCountry[data1314$ReceivingCountry=="GB"]<-"UK" data1314$SendingCountry[data1314$SendingCountry=="GB"]<-"UK" data1314<- data1314 %>% filter(!is.na(year)) %>% select(-c(month,day)) #separate to 2013 and 2014 #2014 data14<-data1314 %>% filter(year=="2014") #2013 data13<-data1314 %>% filter(year=="2013")
  • 38. #summary host 2013B host13<-data13 %>% count(ReceivingCountry,sort = TRUE) #rename to COUNTRY names(host13) names(host13)[names(host13)=="ReceivingCountry"]<-"COUNTRY" names(host13)[names(host13)=="n"]<-"GOING.TO" #summary,home home13<- data13 %>% count(SendingCountry,sort = TRUE) #rename to COUNTRY names(home13) names(home13)[names(home13)=="SendingCountry"]<-"COUNTRY" names(home13)[names(home13)=="n"]<-"LEAVING.FROM" #2013B #MERGE TOTAL13B<-inner_join(home13,host13,by="COUNTRY") dt13b<-as.Date("2013-09-01") total13b<- TOTAL13B %>% add_column(DATE=dt13b,.before = "COUNTRY")
  • 39. Date2period <- function(dt13b, period = 6, sep = " S") { ym<- as.yearmon(dt13b) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total13b$DATE <- Date2period(total13b$DATE) #part 2 TOTAL13B$COUNTRY[TOTAL13B$COUNTRY=="GB"]<-"UK" TOTAL13<- inner_join(TOTAL13A,TOTAL13B, by="COUNTRY") TOTAL13<- TOTAL13 %>% group_by(COUNTRY) %>% transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y, GOING.TO=GOING.TO.x+GOING.TO.y) inner_join(TOTAL09,TOTAL13) #2014 #summary,host host14<-data14 %>% count(ReceivingCountry,sort = TRUE) #rename to COUNTRY names(host14) names(host14)[names(host14)=="ReceivingCountry"]<-"COUNTRY" names(host14)[names(host14)=="n"]<-"GOING.TO"
  • 40. #summary,home home14<- data14 %>% count(SendingCountry,sort = TRUE ) #rename to COUNTRY names(home14) names(home14)[names(home14)=="SendingCountry"]<-"COUNTRY" names(home14)[names(home14)=="n"]<-"LEAVING.FROM" #TOTAL 2012 #MERGE TOTAL14<-inner_join(home14,host14,by="COUNTRY") dt14<-as.Date("2014-02-01") total14<- TOTAL14 %>% add_column(DATE=dt14,.before = "COUNTRY") Date2period <- function(dt14, period = 6, sep = " S") { ym<- as.yearmon(dt14) paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep) } total14$DATE <- Date2period(total14$DATE) rm(home13,home14,host13,host14,TOTAL13A,TOTAL13B,stdata1314,data13,data14,data1314,TOTA L14)
  • 41. #merge all no_students_0814<-bind_rows(total08,total09a,total09b,total10a,total10b,total11a, total11b,total12a,total12b,total13a,total13b,total14) #for part two #starting plot the variables TOTAL08<-add_column(TOTAL08, DATE="2008",.before = "COUNTRY") TOTAL09<-add_column(TOTAL09, DATE="2009",.before = "COUNTRY") TOTAL10<-add_column(TOTAL10, DATE="2010",.before = "COUNTRY") TOTAL11<-add_column(TOTAL11, DATE="2011",.before = "COUNTRY") TOTAL12<-add_column(TOTAL12, DATE="2012",.before = "COUNTRY") TOTAL13<-add_column(TOTAL13, DATE="2013",.before = "COUNTRY") TOTAL14<-add_column(TOTAL14, DATE="2014",.before = "COUNTRY") TOTAL<-bind_rows(TOTAL08,TOTAL09,TOTAL10,TOTAL11,TOTAL12,TOTAL13,TOTAL14)
  • 42. ggplot(no_students_0814,aes(x=DATE,y=LEAVING.FROM,color=COUNTRY,label=COUNTRY,size=LEAV ING.FROM),group=COUNTRY) + geom_point(aes(color=COUNTRY)) rm(total08, total09a,total09b,total10a,total10b,total11a,total11b,total12a,total12b,total13a,total13b,total14) rm(dt08,dt09a,dt09b,dt10a,dt10b,dt11a,dt11b,dt12a,dt12b,dt13a,dt13b,dt14,Date2period) #PART 1 #the top countries who send students #topleaving<- arrange(no_students_0814,desc(LEAVING.FROM)) #top receiving students #topaccepting<- arrange(no_students_0814,desc(GOING.TO)) #top4 #top4<- filter(no_students_0814,COUNTRY %in% c("ES","DE","IT","FR")) #sum sumall<-no_students_0814%>% group_by(COUNTRY) %>% summarise_at(vars(LEAVING.FROM,GOING.TO),funs(sum,mean)) totalno_all<-sum(sumall$LEAVING.FROM_sum,sumall$GOING.TO_sum) #bar graph topsumall<- sumall %>% filter(GOING.TO_mean>2500) ggplot(topsumall,aes(x=COUNTRY,y=GOING.TO_mean))+geom_col()+theme_classic() + labs(title ="Graph 1",x="Host Countries ",y="Average number of incoming students (greater than 2500)")
  • 43. topsumall<- sumall %>% filter(LEAVING.FROM_mean>1500) ggplot(topsumall,aes(x=COUNTRY,y=LEAVING.FROM_mean))+geom_col()+theme_classic() + labs(title ="Graph 2",x="Home Countries ",y="Average number of leaving students (greater than 1500)") glimpse(sumall) #focus on going.to sumgoing<- select(sumall,-c("LEAVING.FROM_sum","LEAVING.FROM_mean")) sumgoing<-arrange(sumgoing,desc(GOING.TO_sum)) topsum_going<-filter(sumgoing,COUNTRY %in% c("ES","DE","IT","FR","UK")) #bar graph ggplot(avgcrimes_top,aes(x=COUNTRY,y=avg.criminals))+geom_col()+theme_classic() + labs(title ="Graph 3",x="Country (top 14)",y="Average number of suspected") #finding fraction totalno_going<-c(sum(sumgoing$GOING.TO_sum)) totalno_topgoing<-sum(topsum_going$GOING.TO_sum) fraction_going<-(totalno_topgoing/totalno_going) * 100 #focus on leaving sumleaving<-sumall%>% group_by(COUNTRY) %>% select(c("LEAVING.FROM_sum","LEAVING.FROM_mean")) %>% arrange(desc(LEAVING.FROM_sum)) topsum_leaving<-filter(sumleaving,COUNTRY %in% c("ES","DE","IT","FR","PL")) #FRACTION LEAVING
  • 44. totalno_leaving<-sum(sumleaving$LEAVING.FROM_sum) totalno_topleaving<-sum(topsum_leaving$LEAVING.FROM_sum) fraction_leaving<-(totalno_topleaving/totalno_leaving)*100 #ggplot #the same for the leaving #focusing on years #ggplot the top five with the TOTAL #GENDER gender0809<- data0809 %>% count(SEX, name = "total_08-09") gender0910<- data0910 %>% count(GENDER,name = "total_09-10") #fraction0910 sumsex0910<- gender0910[1,2]+gender0910[2,2] fractionfemale0910<- (gender0910[1,2]/sumsex0910)*100 gender1011<- data1011 %>% count(GENDER,name = "total_10-11") gender1112<- data1112 %>% count(GENDER,name = "total_11-12") gender1213<- data1213 %>% count(STUDENT_GENDER_CDE,name = "total_12-13") gender1314<- data1314 %>% count(ParticipantGender,name = "total_13-14")
  • 45. sumsex1314<-gender1314[1,2]+gender1314[2,2] fractionsemale13314<-(gender1314[1,2]/sumsex1314)*100 #find the mode function calculate_mode<-function(x) { uniqx<-unique(na.omit(x)) uniqx[which.max(tabulate(match(x,uniqx)))] } #mode_age calculate_mode(data0809$AGE) calculate_mode(data0910$AGE) calculate_mode(data1011$AGE) calculate_mode(data1112$AGE) calculate_mode(data1213$STUDENT_AGE_VALUE) #mode_studylevel calculate_mode(data0809$LEVELSTUDY) calculate_mode(data0910$LEVELSTUDY) calculate_mode(data1011$LEVELSTUDY) calculate_mode(data1112$LEVELSTUDY) calculate_mode(data1213$STUDENT_STUDY_LEVEL_CDE) #subject area area0809<- data0809 %>% count(SUBJECTAREA,name = "total_08-09") area0910<- data0910 %>% count(SUBJECTAREA,name = "total_09-10") area1011<- data1011 %>%
  • 46. count(SUBJECTAREA,name = "total_10-11") area1112<- data1011 %>% count(SUBJECTAREA,name = "total_11-12") area1213<- data1213 %>% count(STUDENT_SUBJECT_AREA_VALUE,name = "total_12-13") area1314<- data1314 %>% count(SubjectAreaCode,name = "total_13-14") sub222<-filter(data1314,SubjectAreaCode=="222") sub340<-filter(data1314,SubjectAreaCode=="340") sub22<-filter(data1314,SubjectAreaCode=="22") sub34<-filter(data1314,SubjectAreaCode=="34") #language taught lang0809<- data0809 %>% count(LANGUAGETAUGHT,name = "total_08-09") lang0910<- data0910 %>% count(LANGUAGETAUGHT,name = "total_09-10") lang1011<- data1011 %>% count(LANGUAGETAUGHT,name = "total_10-11") lang1112<- data1011 %>% count(LANGUAGETAUGHT,name = "total_11-12") lang1213<- data1213 %>% count(LANGUAGE_TAUGHT_CDE,name = "total_12-13") lang1314<- data1314 %>% count(Language,name = "total_13-14") #fraction english0910<-lang0910[6,2] totallang0910<-sum(lang0910$`total_09-10`) fractionlang0910<- (english0910/totallang0910)*100
  • 47. english1314<-lang1314[15,2] totallang1314<-sum(lang1314$`total_13-14`) fractionlang1314<- (english1314/totallang1314)*100 #PART 2 library(Hmisc) #criminality criminality <- read_csv("//hume/student-u02/marthoma/pc/Desktop/crimesbyreportingcountry08- 14.csv") glimpse(crimes) crimes<-criminality %>% select(-c(`Flag and Footnotes`,LEG_STAT)) %>% filter(CITIZEN=="Total",UNIT=="Number") crimes$Value[crimes$Value==":"]<-NA crimes<-crimes %>% select(-c(CITIZEN,UNIT)) %>% rename(COUNTRY=GEO) %>% rename(CRIMINALS=Value) %>% rename(DATE=TIME) crimes$COUNTRY[crimes$COUNTRY=="England and Wales"]<-"United Kingdom" crimes$COUNTRY[crimes$COUNTRY=="Scotland"]<-"United Kingdom" crimes$COUNTRY[crimes$COUNTRY=="Northern Ireland (UK)"]<-"United Kingdom" crimes$COUNTRY[crimes$COUNTRY=="Germany (until 1990 former territory of the FRG)"]<- "Germany"
  • 48. crimes<- crimes%>% filter(!is.na(CRIMINALS)) rm(criminality) #fixing the variables crimes$DATE<-as.character(crimes$DATE) class(crimes$CRIMINALS) crimes$CRIMINALS<-gsub(".","",crimes$CRIMINALS) crimes<-arrange(crimes,desc(CRIMINALS)) TOTAL$COUNTRY[TOTAL$COUNTRY=="GR"]<- "Greece" TOTAL$COUNTRY[TOTAL$COUNTRY=="AT"]<- "Austria" TOTAL$COUNTRY[TOTAL$COUNTRY=="BE"]<- "Belgium" TOTAL$COUNTRY[TOTAL$COUNTRY=="BG"]<- "Burlgaria" TOTAL$COUNTRY[TOTAL$COUNTRY=="CZ"]<- "Czechia" TOTAL$COUNTRY[TOTAL$COUNTRY=="CH"]<- "Switzerland" TOTAL$COUNTRY[TOTAL$COUNTRY=="CY"]<- "Cyprus" TOTAL$COUNTRY[TOTAL$COUNTRY=="DE"]<- "Germany" TOTAL$COUNTRY[TOTAL$COUNTRY=="DK"]<- "Denmark" TOTAL$COUNTRY[TOTAL$COUNTRY=="EE"]<- "Estonia" TOTAL$COUNTRY[TOTAL$COUNTRY=="ES"]<- "Spain" TOTAL$COUNTRY[TOTAL$COUNTRY=="FI"]<- "Finland" TOTAL$COUNTRY[TOTAL$COUNTRY=="FR"]<- "France" TOTAL$COUNTRY[TOTAL$COUNTRY=="GB"]<- "United Kingdom" TOTAL$COUNTRY[TOTAL$COUNTRY=="HR"]<- "Croatia" TOTAL$COUNTRY[TOTAL$COUNTRY=="HU"]<- "Hungary" TOTAL$COUNTRY[TOTAL$COUNTRY=="IE"]<- "Ireland"
  • 49. TOTAL$COUNTRY[TOTAL$COUNTRY=="IS"]<- "Iceland" TOTAL$COUNTRY[TOTAL$COUNTRY=="IT"]<- "Italy" TOTAL$COUNTRY[TOTAL$COUNTRY=="LT"]<- "Lithuania" TOTAL$COUNTRY[TOTAL$COUNTRY=="LI"]<- "Liechtenstein" TOTAL$COUNTRY[TOTAL$COUNTRY=="LU"]<- "Luxembourg" TOTAL$COUNTRY[TOTAL$COUNTRY=="LV"]<- "Latvia" TOTAL$COUNTRY[TOTAL$COUNTRY=="MT"]<- "Malta" TOTAL$COUNTRY[TOTAL$COUNTRY=="NL"]<- "Netherlands" TOTAL$COUNTRY[TOTAL$COUNTRY=="NO"]<- "Norway" TOTAL$COUNTRY[TOTAL$COUNTRY=="PL"]<- "Poland" TOTAL$COUNTRY[TOTAL$COUNTRY=="PT"]<- "Portugal" TOTAL$COUNTRY[TOTAL$COUNTRY=="RO"]<- "Romania" TOTAL$COUNTRY[TOTAL$COUNTRY=="SE"]<- "Sweden" TOTAL$COUNTRY[TOTAL$COUNTRY=="SI"]<- "Slovenia" TOTAL$COUNTRY[TOTAL$COUNTRY=="SK"]<- "Slovakia" TOTAL$COUNTRY[TOTAL$COUNTRY=="TR"]<- "Turkey" TOTAL$COUNTRY[TOTAL$COUNTRY=="UK"]<- "United Kingdom" TOTAL$DATE<-as.character(TOTAL$DATE) glimpse(TOTAL) glimpse(crimes) st_cri<-right_join(crimes,TOTAL, by=c("COUNTRY","DATE"),copy=FALSE) glimpse(st_cri) #PLOT TO SEE ANY POTENCIAL CONNECTION ggplot(st_cri,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_point() ggplot(st_cri,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_text()
  • 50. #focus on top 5 st_cri_top5<- st_cri%>% filter(COUNTRY %in% c("Spain","Germany","Italy","France","United Kingdom")) ggplot(st_cri_going,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_text() #average of values glimpse(st_cri) st_cri$CRIMINALS<-as.numeric(st_cri$CRIMINALS) avgcrimes<- st_cri%>% filter(!is.na(CRIMINALS))%>% group_by(COUNTRY) %>% summarise(avg.criminals=mean(CRIMINALS)) #avgcrimes$avg.criminals<-format(round(avgcrimes$avg.criminals,2),nsmall = 2) class(avgcrimes$avg.criminals) avgcrimes_top<- avgcrimes%>% filter(avg.criminals>180000) ggplot(avgcrimes_top,aes(x=COUNTRY,y=avg.criminals))+geom_col()+theme_classic() + labs(title ="Graph 3",x="Countries (top 14)",y="Average number of suspected") avgst<-st_cri%>% group_by(COUNTRY) %>% summarise(avg.students=mean(GOING.TO)) class(avgst$avg.students) avgst_crimes$avg.students<-format(round(avgst$avg.students,2),nsmall = 2) avg_cri.st<- inner_join(avgcrimes,avgst,by="COUNTRY")
  • 51. #scatter plot theme_set(theme_bw()) ggplot(avg_cri.st,aes(x=avg.criminals,y=avg.students,label=COUNTRY)) +geom_text() +labs(title="Figure 1", x="Average number of suspected people",y="Average number of receiving students") + theme_classic() #add limits: + xlim(c(0, 0.1)) + ylim(c(0, 500000)), line: + geom_smooth(method="loess", se=F) #ols: geom_smooth(method="lm", se=FALSE) + #geom_smooth(method="lm", se=FALSE) + #theme_bw() #correlations #spain st_spain<-TOTAL%>% filter(COUNTRY=="Spain") %>% select(-c(LEAVING.FROM)) cri_spain<-crimes%>% filter(COUNTRY=="Spain") cri_spain$CRIMINALS<-as.numeric(cri_spain$CRIMINALS) class(cri_spain$CRIMINALS) rcorr(st_spain$GOING.TO,cri_spain$CRIMINALS,type="pearson") #france st_fr<-TOTAL%>% filter(COUNTRY=="France") %>% select(-c(LEAVING.FROM)) cri_france<-crimes%>%
  • 53. #continue to the next file: education #07-11 eduexp07_11 <- read_csv("m:/pc/desktop/eduexp07-11.csv") eduexp0711<- eduexp07_11%>% select(-c(`Flag and Footnotes`,UNIT))%>% filter(INDIC_ED=="Total public expenditure on education as % of GDP, at tertiary level of education (ISCED 5-6)") %>% filter(TIME !="2007") eduexp0711$GEO[eduexp0711$GEO=="Germany (until 1990 former territory of the FRG)"]<- "Germany" rm(eduexp07_11) eduexp0711$Value[eduexp0711$Value==":"]<-NA eduexp0711<-eduexp0711%>% select(-c(INDIC_ED)) %>% filter(!is.na(Value)) glimpse(eduexp0711) #changing to numeric class(eduexp0711$Value) eduexp0711$Value<-gsub(",","",eduexp0711$Value) eduexp0711$Value<-as.numeric(eduexp0711$Value) eduexp0711$Value<-(eduexp0711$Value)/100 class(eduexp0711$Value)
  • 54. eduexp0711$Value<-as.numeric(eduexp0711$Value) #12-14 educ_uoe_fine06_1_Data <- read_csv("M:/pc/Desktop/educ_uoe_fine06_1_Data.csv", col_types = cols(TIME = col_character())) eduexp1214<- educ_uoe_fine06_1_Data%>% select(-c(`Flag and Footnotes`,UNIT,ISCED11)) %>% filter(GEO !="European Union - 28 countries") eduexp1214$GEO[eduexp1214$GEO=="Germany (until 1990 former territory of the FRG)"]<- "Germany" eduexp1214$Value[eduexp1214$Value==":"]<-NA rm(educ_uoe_fine06_1_Data) eduexp1214<-eduexp1214%>% filter(!is.na(Value)) eduexp1214$Value<-as.numeric(eduexp1214$Value) class(eduexp1214$Value) glimpse(eduexp1214) eduexp1214$TIME<-as.numeric(eduexp1214$TIME) class(eduexp1214$TIME) #merge
  • 55. eduexp<-bind_rows(eduexp0711,eduexp1214) glimpse(eduexp) eduexp$DATE<-as.character(eduexp$DATE) eduexp<-eduexp%>% rename(DATE=TIME)%>% rename(COUNTRY=GEO) %>% rename(public.exp=Value) #merge st_exp<-inner_join(eduexp,TOTAL, by=c("COUNTRY","DATE"),copy=FALSE) glimpse(st_exp) st_exp<-st_exp%>% select(-c(LEAVING.FROM)) %>% filter(!is.na(public.exp)) #find avg avgexp<-st_exp%>% group_by(COUNTRY)%>% summarise(avgpublic.exp=mean(public.exp)) class(avgexp$avgpublic.exp) avgexp$avgpublic.exp<-format(round(avgexp$avgpublic.exp,3),nsmall = 3) glimpse(avgst) #merge avg_exp.st<- inner_join(avgexp,avgst,by="COUNTRY") class(avg_exp.st$avgpublic.exp)
  • 56. avg_exp.st$avgpublic.exp<-as.numeric(avg_exp.st$avgpublic.exp) #plot ggplot(avg_exp.st,aes(x=avgpublic.exp,y=avg.students,label=COUNTRY)) +geom_text() +labs(title="Figure 2", x="Average percentage of public expenditure",y="Average number of receiving students") + theme_classic() #see time plot for the top 3, if betters or not spain_exp<-st_exp%>% filter(COUNTRY=="Spain") ggplot(spain_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm", se=FALSE) +labs(title="Figure 3: Spain 2008-2014", x="Percentage of public expenditure",y="Average number of receiving students") + theme_classic() #germany de_exp<-st_exp%>% filter(COUNTRY=="Germany") ggplot(de_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm", se=FALSE)+labs(title="Figure 5: Germany 2008-2014", x="Percentage of public expenditure",y="Average number of receiving students") + theme_classic() #france fr_exp<-st_exp%>% filter(COUNTRY=="France") ggplot(fr_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm", se=FALSE)+labs(title="Figure 4: France 2008-2014", x="Percentage of public expenditure",y="Average number of receiving students") + theme_classic() #italy
  • 57. it_exp<-st_exp%>% filter(COUNTRY=="Italy") ggplot(it_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm", se=FALSE)+labs(title="Figure 6: Italy 2008-2014", x="Percentage of public expenditure",y="Average number of receiving students") + theme_classic() #uk uk_exp<-st_exp%>% filter(COUNTRY=="United Kingdom") ggplot(uk_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm", se=FALSE)+labs(title="Figure 7: England 2008-2014", x="Percentage of public expenditure",y="Average number of receiving students") + theme_classic()