University Student’s mobility under the ERASMUS program in European Union, 2008-2014.

University Student’s mobility under the ERASMUS program in
European Union, 2008-2014.
Candidate number: 171846
Introduction _______________________________________________________________2
Part 1_____________________________________________________________________3
Part 2_____________________________________________________________________5
Conclusion_________________________________________________________________7
Appendix__________________________________________________________________8
Code ____________________________________________________________________10

Introduction
The ERASMUS program (European Community Action Scheme for the Mobility of
University Students) is a student exchange program among the universities which are located
in EU member states. It operates since 1987 and its primary aim is to promote the
collaboration as well as to lessen the cultural differences by experiencing the life abroad.
Even though that there are other programs similar to ERASMUS that operate in EU, such as
ERASMUS+, we will focus our analysis to university students regardless their level of
studies.
Our analysis consists of two parts. In the first place, we are going to aggregate the
various entries to find insights about the student’s mobility characteristics. For that purpose,
we are using the data files on EU Open Data Portal (data.europa.eu), which vary from the
second semester of 2008 until the first semester of 2014. In these files, we can find entries of
students who chose to participate on the program the above period and also, more specific
characteristics about them, such as their age, their level of studies etc. The second part
focuses on the countries and tries to investigate the possible effect on mobility by the
countries’ specific characteristics. In combination with our aggregate data from the first part,
we use data files, which were retrieved from the website of Eurostat
(ec.europa.eu/eurostat/data/database).

Part 1
 Countries.
Between the periods of our interest, approximately 2 million students decided to
participate in ERASMUS program. The top five countries that attracted the students' interest
are Spain, France, Germany, United Kingdom and Italy. These countries were chosen by the
55.5% of all the students who chose to study abroad. The latter means that 1 out of 2 students
preferred a country among the top five destinations, where Spain reaches the first position.
On the other side, the students which chose to leave their country1
have almost the
same preferences in regard with the host countries. However, instead of Italy, in the fifth
place is Poland and Italy follows closely on sixth. In addition, we observed the same pattern
in preferences, meaning that the 59.5% of the total number of students who left their
countries belongs to the top five countries.

 Gender, age and study level.
According to our data, the females prefer to participate more in the exchange program
than the men. That is to say, the percentage of the total participants who are women varies
around 60% each year2
. In addition, the most students chose to participate when they were at
the age of 21; number which changes only for the period 2008-2009 (23 years old).
Moreover, the majority of participants studied at the first cycle of their studies when they
chose to participate in the program.
 Subject area and taught language.
The educational background of the majority of the participants belongs to the study
areas of business and administration, humanities and foreign languages. Notably, the latter
gathers a vast number of students and are significant larger than the other categories. As a
result, we can conclude that a large fraction of the overall students chose to participate aiming
to a better understanding of a foreign language; as a supplement to their studies. This
conclusion is particular interesting as we can see that the students chose the program as a
mean to learn a language better and thus, a new culture. The latter, also, is one of the main
reasons that the ERASMUS program has been established among the European states.
Furthermore, we can see, unsurprisingly that the main language which dominates
among the taught courses is English. In addition, the native languages from the top preferable
countries complete the top taught languages by the university courses. However, the picture
gets interesting when we look throughout the years. The percentage of English language
accounts for 45,4% in 2008 and reaches 58% by the end of the first semester of 2014. In
particular, the latter result quite contradicts our above observations. Firstly, the majority of
ongoing university studies belong to foreign language area, a fraction that becomes bigger as
the years pass. At the same time, however, the percentage of courses being taught in English
also rises, in a great degree. Secondly, we can see that the top non-English countries have a
steady slightly increasing number of students throughout the years. Although, we can find
their native languages on the top list, the percentage of English taught courses remains large.
Given these points, one can conclude that the success of ERASMUS program (attracting a
great number of students every year), gave an incentive to the universities in non-English
countries to add English taught courses.

Part 2
In this part, we will try to examine the reasons under the participants' preferences
regarding the countries which they chose to travel and study. Henceforth, we will compare
the total number of receiving students for each country with three different facts of the
country; its criminal rates, its governments' expenditure in tertiary education and its……….
 Criminality
Our data set contains the number of total suspected people who were arrested in one
of the European states, regardless their nationality. While we aggregate the data, we can see
that the biggest number of total suspected people belongs to Germany (Graph 3 in Appendix).
Meanwhile, France and Italy also, belong to the first five positions. As it is known, the
criminal rates are considered to be a negative factor for the country's popularity towards the
visitors. Together with our findings from the first part, it seems strange to see three of the
most popular countries for students, also having high criminality. In addition, when we plot
the average number of students against the average number of criminals, we observed a
slightly positive relationship
Despite the latter surprising result, we plot the average number of students against the
average number of criminals (see Figure 1 in Appendix). Given the scatter plot, we cannot
establish any relationship between the variables. Moreover, when we see the rates of
criminality per hundred thousand inhabitants, we see slightly different picture. Except
Finland, the rates are almost the same among the most popular destinations for the students.
As well as, the correlation of incoming students and number of criminals for these countries
is very low (lower than 0.5). Under these two conclusions, we can state that the students were
not concerned about the criminal rates when they were choosing for an ERASMUS
destination. In other words, the criminality of a country did not affect the students' choice to
study there.
 Public expenditure in tertiary education
For this section, our data represent each country's public expenditure in tertiary
education as a percentage of its gross domestic product (GDP). To begin with, we observe
that the Scandinavian countries spend the biggest percentage in education. At the same time,

these countries have least or the same popularity with countries which spend less or much
less. Overall, looking the scatter plot (Figure 2 ), we are unable to establish any correlation
with the percentage of GDP spent in education and the popularity as a ERASMUS
destination.
Furthermore, we can look closely into the first five popular destinations. Firstly, we
can exclude the years 2008 and 2014 as we have data for only one semester. Secondly, we
can observe a decline in expenditure on the year 2012 for all countries expect England.
Markedly, it reflects the economic depression that the governments' decisions against it. The
main result from the scatter plots and the linear regressions (see Figure 3,4,5,6,7 in
Appendix) is that of a positive relationship. To be more specific, we can establish that the
public expenditure in education can positively affect the number of ERASMUS students in a
country. However, the relationship is slightly positive and we can some occasions that there
is not a positive relationship.

Conclusion
In a final analysis, we saw that the students' mobility has being characterized by some specific
patterns throughout the time frame of 2008-2014. For instance, the participants' preferences vary very
little between the years regarding both the home and the host countries. Notably, there are five
countries that dominate in numbers in both categories and hold more than the 50% of the total number
of participants. These countries are Spain, France, Germany, England and Italy (or Poland).
Moreover, we observed that the majority of participants studied towards a degree concerning foreign
languages. Hence, we can say that the program is a tool for those students to better their studies. In the
same time, we saw that the percentage of English taught courses increased rapidly from 2008.
Additionally, we tried to explain if criminality or public expenditure in tertiary education is a
factor that affects the participants to choose a country as their study destination. In general, we would
have expected that the first will have a negative effect, while the latter a positive. However, we found
strong evidences that the criminality is not a factor that the students are taking into consideration
when they choose their destination. On the other hand, we found a slightly positive relationship
between the number of incoming students and the percentage of public expenditure in education.
Nonetheless, high expenditure in education does not mean large number of students as we saw in the
case of Scandinavian countries.
As can be seen, our analysis concerns a small time frame where is difficult to draw strong
conclusions about the ERASMUS mobility. Likewise, a further more analytical analysis for the
students' mobility needs to take place in order to justify our findings in that paper.

Code
#extracting the students mobility by year
library(tidyverse)
library(dplyr)
library(readr)
library(lubridate)
library(ggplot2)
stdata0809 <- read_delim("M:/pc/Desktop/student_data_2008 (1).csv",
";", escape_double = FALSE, trim_ws = TRUE)
glimpse(stdata0809)
data0809<-stdata0809 %>%
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,LENGTHWORKPLACEMENT,TYPEWORKSECTO
R,CONSORTIUMAGREEMENTNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDIT
SWORK,WORKPLACEMENT,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTIO
N,HOSTINSTITUTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARA
TION,TAUGHTHOSTLANG)) %>%
filter(MOBILITYTYPE != "P") %>%
filter(!is.na(STUDYSTARTDATE)) %>%
separate(STUDYSTARTDATE,sep="-",into=c("MONTH","YEAR"))
glimpse(data0809)
rm(stdata0809)
sum(is.na(data0809$YEAR))
sum(data0809$COUNTRYOFHOSTINSTITUTION==0)
data0809<- data0809 %>%

filter(COUNTRYOFHOSTINSTITUTION !=0) %>%
filter(!is.na(YEAR)) %>%
select(-c(MONTH))
#separate to 2008 and 2009
#2008
data08<-data0809 %>%
filter(YEAR=="2008")
#2009
#summary,host
#host08<-data08 %>%
#count(COUNTRYOFHOSTINSTITUTION,sort = TRUE, name="GOING.TO")
host08<-data08 %>%
count(COUNTRYOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host08)
names(host08)[names(host08)=="COUNTRYOFHOSTINSTITUTION"]<-"COUNTRY"
names(host08)[names(host08)=="n"]<-"GOING.TO"
#summary,home
#home08<- data08 %>%
#count(COUNTRYOFHOMEINSTITUTION,sort = TRUE, name="LEAVING.FROM" )
home08<- data08 %>%
count(COUNTRYOFHOMEINSTITUTION,sort = TRUE)

#rename to COUNTRY
names(home08)
names(home08)[names(home08)=="COUNTRYOFHOMEINSTITUTION"]<-"COUNTRY"
names(home08)[names(home08)=="n"]<-"LEAVING.FROM"
#dif no of obs
anti_join(host08,home08,by="COUNTRY")
home08<- home08 %>%
add_row(COUNTRY="FR",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0) %>%
add_row(COUNTRY="CH",LEAVING.FROM=0)
anti_join(home8,host08,by="COUNTRY")
#TOTAL 2008 s2
#ADD YEAR COLUMN + row for HR
#PLOT TO SEE HOW IT LOOKS
#ggplot(to08,aes(x=YEAR,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text()
total08<-inner_join(host08,home08,by="COUNTRY")
dt08<-as.Date("2008-08-01")
library(zoo)

total08<-total08 %>%
add_row(COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0) %>%
add_column(DATE=dt08,.before = "COUNTRY")
Date2period <- function(dt08, period = 6, sep = " S") {
ym<- as.yearmon(dt08)
paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
}
#general function
#Date2period <- function(x, period = 6, sep = " S") {
#ym<- as.yearmon(x)
# paste(as.integer(ym), (cycle(ym) - 1) %/% period + 1, sep = sep)
#}
total08$DATE <- Date2period(total08$DATE)
#the the second part
TOTAL08<-left_join(host08,home08,by="COUNTRY")
TOTAL08<-TOTAL08 %>%
add_row(COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0) %>%
add_row(COUNTRY = "CH", LEAVING.FROM = 0,GOING.TO=0)
#2009a
#summary,host
host09<-data09 %>%

#rename to COUNTRY
names(host09)
#summary,home
home09<- data09 %>%
#rename to COUNTRY
names(home09)
home09<- home09 %>%
add_row(COUNTRY="FR",LEAVING.FROM=0) %>%
add_row(COUNTRY="TR",LEAVING.FROM=0)
#TOTAL 2009a
#ADD YEAR COLUMN
#MERGE
TOTAL09A<-inner_join(home09,host09,by="COUNTRY")
TOTAL09A<-add_row(TOTAL09A,COUNTRY = "HR", LEAVING.FROM = 0,GOING.TO=0)
dt09a<-as.Date("2009-02-01")

total09a<- TOTAL09A %>%
add_column(DATE=dt09a,.before = "COUNTRY")
Date2period <- function(dt09a, period = 6, sep = " S") {
ym<- as.yearmon(dt09a)
}
total09a$DATE <- Date2period(total09a$DATE)
#test how ggplot looks
#total<-bind_rows(total08,total09a)
#ggplot(total,aes(x=DATE,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text()
#ggplot(total09,aes(x=DATE,y=LEAVING.FROM,label=COUNTRY),group=COUNTRY) + geom_text()
#keep it clean
rm(data08,data09,home08,home09,host09,host08)
#IMPORTING THE NEXT FILE 2009-2010

stdata0910 <- read_delim("M:/pc/Desktop/student_data_2009.csv",
glimpse(stdata0910)
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,LENGTHWORKPLACEMENT,TYPEWORKSECTO
R,CONSORTIUMAGREEMENTNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDIT
SWORK,WORKPLACEMENT,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTIO
N,HOSTINSTITUTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARA
TION,TAUGHTHOSTLANG)) %>%
glimpse(data0910)
rm(stdata0910)
sum(data0910$COUNTRYOFHOSTINSTITUTION==0)
data0910<- data0910 %>%
filter(COUNTRYOFHOSTINSTITUTION !=0) %>%
select(-c(MONTH))
#2010

#2009
#summary,host
host10<-data10 %>%
#rename to COUNTRY
names(host10)
#summary,home
home10<- data10 %>%
#rename to COUNTRY
names(home10)
sum(data0910$COUNTRYOFHOMEINSTITUTION=="SE")
home10<- home10 %>%
add_row(COUNTRY="SE",LEAVING.FROM=0) %>%

add_row(COUNTRY="EE",LEAVING.FROM=0)
host10<- host10%>%
add_row(COUNTRY="HR",GOING.TO=0)
# s o s
#We see that the HR country only send students and not receiving
#home10$COUNTRY[!(home10$COUNTRY %in% host10$COUNTRY)]
#TOTAL 2010
#ADD YEAR COLUMN
#MERGE
dt10a<-as.Date("2010-02-01")
}
str(total10a)
#2009 the same as before

#summary,host
host09<-data09 %>%
#rename to COUNTRY
names(host09)
#summary,home
home09<- data09 %>%
count(COUNTRYOFHOMEINSTITUTION,sort = TRUE )
#rename to COUNTRY
names(home09)
host09$COUNTRY[!(host09$COUNTRY %in% home09$COUNTRY)]
home09$COUNTRY[!(home09$COUNTRY %in% host09$COUNTRY)]
home09<- home09 %>%
add_row(COUNTRY="SE",LEAVING.FROM=0) %>%
add_row(COUNTRY="EE",LEAVING.FROM=0) %>%
host09<- host09%>%
add_row(COUNTRY="HR",GOING.TO=0) %>%
add_row(COUNTRY="cH",GOING.TO=0)

#TOTAL 2009
#MERGE
TOTAL09B<-left_join(home09,host09,by="COUNTRY")
dt09b<-as.Date("2009-09-01")
total09b<- TOTAL09B %>%
add_column(DATE=dt09b,.before = "COUNTRY")
Date2period <- function(dt09b, period = 6, sep = " S") {
ym<- as.yearmon(dt09b)
}
total09b$DATE <- Date2period(total09b$DATE)
#part 2
TOTAL09<- left_join(TOTAL09B,TOTAL09A, by="COUNTRY")
TOTAL09[is.na(TOTAL09)]<-0
TOTAL09<- TOTAL09 %>%
group_by(COUNTRY) %>%
transmute(LEAVING.FROM=LEAVING.FROM.x+LEAVING.FROM.y,
GOING.TO=GOING.TO.x+GOING.TO.y)

#CLEAN
rm(home10,host10,home09,host09,data09,data10)
#import the next file 2010-2011
stdata1011 <- read_delim("M:/pc/Desktop/student_data_2010 (1).csv",
glimpse(stdata1011)
names(stdata1011)
sum(is.na(stdata1011$STUDYSTARTDATE))
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,TYPEWORKSECTOR,CONSORTIUMAGREEMEN
TNUMBER,SEVSUPPLEMENT,COUNTRYOFWORKPLACEMENT,ECTSCREDITSWORK,WORKPLACEMENT,
ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTION,HOSTINSTITUTION,QUALI
FICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARATION,TAUGHTHOSTLANG))
%>%

sum(data1011$COUNTRYCODEOFHOSTINSTITUTION==0)
#BEFR,BENL referrring to BE
data1011$COUNTRYCODEOFHOSTINSTITUTION[data1011$COUNTRYCODEOFHOSTINSTITUTION=="B
EFR"]<-"BE"
ENL"]<-"BE"
data1011$COUNTRYCODEOFHOMEINSTITUTION[data1011$COUNTRYCODEOFHOMEINSTITUTION==
"BEFR"]<-"BE"
"BENL"]<-"BE"
glimpse(data1011)
data1011<- data1011 %>%
select(-c(MONTH))
#2010
#2009

#summary,host
host10<-data10 %>%
count(COUNTRYCODEOFHOSTINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(host10)
names(host10)[names(host10)=="COUNTRYCODEOFHOSTINSTITUTION"]<-"COUNTRY"
#summary,home
home10<- data10 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(home10)
names(home10)[names(home10)=="COUNTRYCODEOFHOMEINSTITUTION"]<-"COUNTRY"
home10<- home10 %>%
add_row(COUNTRY="MT",LEAVING.FROM=0)%>%

#TOTAL 2010
#ADD YEAR COLUMN
#MERGE
TOTAL10B<-inner_join(home10,host10,by="COUNTRY")
TOTAL10B$COUNTRY[!(TOTAL10B$COUNTRY %in% TOTAL10A$COUNTRY)]
TOTAL10A<- TOTAL10A %>%
add_row(COUNTRY="CH",GOING.TO=0,LEAVING.FROM=0)
dt10b<-as.Date("2010-09-01")
}
#part 2
TOTAL10<- inner_join(TOTAL10A,TOTAL10B, by="COUNTRY")

#2011
#summary,host
host11<-data11 %>%
#rename to COUNTRY
names(host11)
#summary,home
home11<- data11 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE )
#rename to COUNTRY
names(home11)
host11$COUNTRY[!(host11$COUNTRY %in% home11$COUNTRY)]
home11$COUNTRY[!(home11$COUNTRY %in% host11$COUNTRY)]
home11<- home11 %>%
add_row(COUNTRY="CH",LEAVING.FROM=0) %>%

add_row(COUNTRY="MT",LEAVING.FROM=0)
#TOTAL 2011
#MERGE
dt11a<-as.Date("2011-02-01")
}
#REMOVE
rm(data11,home11,host11,stdata1011)
rm(data10,home10,host10)
#2011-2012

stdata1112 <- read_delim("//hume/student-u02/marthoma/pc/Desktop/student_1112.csv",
glimpse(stdata1112)
names(stdata1112)
sum(is.na(stdata1112$STUDYSTARTDATE))
select(-
c(NATIONALITY,PLACEMENTSTARTDATE,STUDYGRANT,PLACEMENTENTERPRISE,COUNTRYOFPLACEM
ENT,ECTSCREDITSPLACEMENT,LENGTHPLACEMENT,TYPEPLACEMENTSECTOR,CONSORTIUMAGREEM
ENTNUMBER,ENTERPRISESIZE,PLACEMENTGRANT,SHORTDURATION,HOMEINSTITUTION,HOSTINSTIT
UTION,QUALIFICATIONATHOST,ECTSCREDITSSTUDY,TOTALECTSCREDITS,LINGPREPARATION,TAUGHT
HOSTLANG,SNSUPPLEMENT)) %>%
sum(data1112$COUNTRYCODEOFHOSTINSTITUTION==0)
EFR"]<-"BE"
ENL"]<-"BE"
EDE"]<-"BE"

"BEFR"]<-"BE"
"BENL"]<-"BE"
glimpse(data1112)
data1112<- data1112 %>%
select(-c(MONTH))
#2012
filter(YEAR=="12")
#2011
filter(YEAR=="11")
#summary,host
host12<-data12 %>%
#rename to COUNTRY
names(host12)

#summary,home
home12<- data12 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE)
#rename to COUNTRY
names(home12)
#TOTAL 2012A
#MERGE
dt12a<-as.Date("2012-02-01")
}

#2011B
#summary,host
host11<-data11 %>%
#rename to COUNTRY
names(host11)
#summary,home
home11<- data11 %>%
count(COUNTRYCODEOFHOMEINSTITUTION,sort = TRUE )
#rename to COUNTRY
names(home11)
#TOTAL 2011
#MERGE
dt11b<-as.Date("2011-09-01")

}
#part2
#REMOVE
rm(data12,data11,home11,host11,home12,host12,stdata1112)
#2012-2013
stdata1213 <- read_delim("M:/pc/Desktop/SM_2012_13_20141103_01 (2).csv",

glimpse(stdata1213)
names(stdata1213)
sum(is.na(stdata1213$STUDY_START_DATE))
select(-
c(STUDENT_ID,TYPE_PLACEMENT_SECTOR_VALUE,ECTS_CREDITS_PLACEMENT_AMT,NUMB_YRS_HI
GHER_EDUCAT_VALUE,PLACEMENT_ENTERPRISE_VALUE,TOTAL_ECTS_CREDITS_AMT,
QUALIFICATION_AT_HOST_CDE,STUDENT_NATIONALITY_CDE,ID_MOBILITY_CDE,PLACEMENT_ENTE
RPRISE_CTRY_CDE,PLACEMENT_ENTERPRISE_CTRY_CDE,LENGTH_PLACEMENT_VALUE,
CONSORTIUM_AGREEMENT_NUMBER,SPECIAL_NEEDS_SUPPLEMENT_VALUE,STUDY_GRANT_AMT,
HOST_INSTITUTION_CDE,HOME_INSTITUTION_CDE,PLACEMENT_ENTERPRISE_SIZE_CDE,
SHORT_DURATION_CDE,ECTS_CREDITS_STUDY_AMT,PLACEMENT_GRANT_AMT)) %>%
filter(MOBILITY_TYPE_CDE != "P") %>%
filter(!is.na(STUDY_START_DATE)) %>%
separate(STUDY_START_DATE,sep="-",into=c("MONTH","YEAR"))
sum(data1213$HOST_INSTITUTION_COUNTRY_CDE==0)
data1213$HOST_INSTITUTION_COUNTRY_CDE[data1213$HOST_INSTITUTION_COUNTRY_CDE=="BE
FR"]<-"BE"
NL"]<-"BE"
DE"]<-"BE"

data1213$HOME_INSTITUTION_CTRY_CDE[data1213$HOME_INSTITUTION_CTRY_CDE=="BEFR"]<-
"BE"
data1213$HOME_INSTITUTION_CTRY_CDE[data1213$HOME_INSTITUTION_CTRY_CDE=="BENL"]<-
"BE"
data1213<- data1213 %>%
select(-c(MONTH))
#2012
filter(YEAR=="12")
#2013
filter(YEAR=="13")
#summary host 2013
host13<-data13 %>%
count(HOST_INSTITUTION_COUNTRY_CDE,sort = TRUE)
#rename to COUNTRY
names(host13)
names(host13)[names(host13)=="HOST_INSTITUTION_COUNTRY_CDE"]<-"COUNTRY"
#summary,home
home13<- data13 %>%

count(HOME_INSTITUTION_CTRY_CDE,sort = TRUE)
#rename to COUNTRY
names(home13)
names(home13)[names(home13)=="HOME_INSTITUTION_CTRY_CDE"]<-"COUNTRY"
#2013A
#MERGE
dt13a<-as.Date("2013-02-01")
}
#summary,host
host12<-data12 %>%
count(HOST_INSTITUTION_COUNTRY_CDE,sort = TRUE)
#rename to COUNTRY

names(host12)
names(host12)[names(host12)=="HOST_INSTITUTION_COUNTRY_CDE"]<-"COUNTRY"
#summary,home
home12<- data12 %>%
count(HOME_INSTITUTION_CTRY_CDE,sort = TRUE )
#rename to COUNTRY
names(home12)
names(home12)[names(home12)=="HOME_INSTITUTION_CTRY_CDE"]<-"COUNTRY"
#TOTAL 2012
#MERGE
dt12b<-as.Date("2012-09-01")
}

#part2
#clean
rm(data13,data12,stdata1213,home12,home13,host12,host13)
#2013-2014
library(readxl)
stdata1314<-read_excel("M:/pc/Desktop/Student_Mobility_2013-14.xlsx")
#changing the date character into date format
stdata1314$StartDate<-dmy_hms(stdata1314$StartDate,tz=Sys.timezone())
glimpse(stdata1314)
names(stdata1314)
sum(is.na(stdata1314$StartDate))
sum(stdata1314$ReceivingCountry=="BEFR")

sum(stdata1314$ReceivingCountry=="BENL")
sum(stdata1314$ReceivingCountry=="BEDE")
sum(stdata1314$SendingCountry=="BEFR")
data1314<- stdata1314 %>%
filter(CombinedMobilityYesNo=="NO") %>%
filter(MobilityType == "Mob-SMS") %>%
separate(StartDate,sep="-",into=c("year","month","day")) %>%
select(-
c(Action,CombinedMobilityYesNo,ProjectNumber,SpecialNeeds,EndDate,ParticipantID,SendingPartn
erName,
MobiilityID,DurationInMonths,HostingPartnerErasmusID,DurationInDays,ParticipantType,HostingPar
tnerName,
SubsistenceTravel,CallYear,SendingPartnerErasmusID,HostingPartnerCity))
data1314$ReceivingCountry[data1314$ReceivingCountry=="GB"]<-"UK"
data1314$SendingCountry[data1314$SendingCountry=="GB"]<-"UK"
data1314<- data1314 %>%
filter(!is.na(year)) %>%
select(-c(month,day))
#2014
filter(year=="2014")
#2013
filter(year=="2013")

#summary host 2013B
host13<-data13 %>%
count(ReceivingCountry,sort = TRUE)
#rename to COUNTRY
names(host13)
names(host13)[names(host13)=="ReceivingCountry"]<-"COUNTRY"
#summary,home
home13<- data13 %>%
count(SendingCountry,sort = TRUE)
#rename to COUNTRY
names(home13)
names(home13)[names(home13)=="SendingCountry"]<-"COUNTRY"
#2013B
#MERGE
dt13b<-as.Date("2013-09-01")

}
#part 2
TOTAL13B$COUNTRY[TOTAL13B$COUNTRY=="GB"]<-"UK"
inner_join(TOTAL09,TOTAL13)
#2014
#summary,host
host14<-data14 %>%
count(ReceivingCountry,sort = TRUE)
#rename to COUNTRY
names(host14)
names(host14)[names(host14)=="ReceivingCountry"]<-"COUNTRY"

#summary,home
home14<- data14 %>%
count(SendingCountry,sort = TRUE )
#rename to COUNTRY
names(home14)
names(home14)[names(home14)=="SendingCountry"]<-"COUNTRY"
#TOTAL 2012
#MERGE
TOTAL14<-inner_join(home14,host14,by="COUNTRY")
dt14<-as.Date("2014-02-01")
total14<- TOTAL14 %>%
add_column(DATE=dt14,.before = "COUNTRY")
Date2period <- function(dt14, period = 6, sep = " S") {
ym<- as.yearmon(dt14)
}
total14$DATE <- Date2period(total14$DATE)
rm(home13,home14,host13,host14,TOTAL13A,TOTAL13B,stdata1314,data13,data14,data1314,TOTA
L14)

#merge all
no_students_0814<-bind_rows(total08,total09a,total09b,total10a,total10b,total11a,
total11b,total12a,total12b,total13a,total13b,total14)
#for part two
#starting plot the variables
TOTAL08<-add_column(TOTAL08, DATE="2008",.before = "COUNTRY")
TOTAL<-bind_rows(TOTAL08,TOTAL09,TOTAL10,TOTAL11,TOTAL12,TOTAL13,TOTAL14)

ggplot(no_students_0814,aes(x=DATE,y=LEAVING.FROM,color=COUNTRY,label=COUNTRY,size=LEAV
ING.FROM),group=COUNTRY) + geom_point(aes(color=COUNTRY))
rm(total08,
total09a,total09b,total10a,total10b,total11a,total11b,total12a,total12b,total13a,total13b,total14)
rm(dt08,dt09a,dt09b,dt10a,dt10b,dt11a,dt11b,dt12a,dt12b,dt13a,dt13b,dt14,Date2period)
#PART 1
#the top countries who send students
#topleaving<- arrange(no_students_0814,desc(LEAVING.FROM))
#top receiving students
#topaccepting<- arrange(no_students_0814,desc(GOING.TO))
#top4
#top4<- filter(no_students_0814,COUNTRY %in% c("ES","DE","IT","FR"))
#sum
sumall<-no_students_0814%>%
summarise_at(vars(LEAVING.FROM,GOING.TO),funs(sum,mean))
totalno_all<-sum(sumall$LEAVING.FROM_sum,sumall$GOING.TO_sum)
#bar graph
topsumall<- sumall %>%
filter(GOING.TO_mean>2500)
ggplot(topsumall,aes(x=COUNTRY,y=GOING.TO_mean))+geom_col()+theme_classic() + labs(title
="Graph 1",x="Host Countries ",y="Average number of incoming students (greater than 2500)")

topsumall<- sumall %>%
filter(LEAVING.FROM_mean>1500)
ggplot(topsumall,aes(x=COUNTRY,y=LEAVING.FROM_mean))+geom_col()+theme_classic() +
labs(title ="Graph 2",x="Home Countries ",y="Average number of leaving students (greater than
1500)")
glimpse(sumall)
#focus on going.to
sumgoing<- select(sumall,-c("LEAVING.FROM_sum","LEAVING.FROM_mean"))
sumgoing<-arrange(sumgoing,desc(GOING.TO_sum))
topsum_going<-filter(sumgoing,COUNTRY %in% c("ES","DE","IT","FR","UK"))
#bar graph
ggplot(avgcrimes_top,aes(x=COUNTRY,y=avg.criminals))+geom_col()+theme_classic() + labs(title
="Graph 3",x="Country (top 14)",y="Average number of suspected")
#finding fraction
totalno_going<-c(sum(sumgoing$GOING.TO_sum))
totalno_topgoing<-sum(topsum_going$GOING.TO_sum)
fraction_going<-(totalno_topgoing/totalno_going) * 100
#focus on leaving
sumleaving<-sumall%>%
select(c("LEAVING.FROM_sum","LEAVING.FROM_mean")) %>%
arrange(desc(LEAVING.FROM_sum))
topsum_leaving<-filter(sumleaving,COUNTRY %in% c("ES","DE","IT","FR","PL"))
#FRACTION LEAVING

totalno_leaving<-sum(sumleaving$LEAVING.FROM_sum)
totalno_topleaving<-sum(topsum_leaving$LEAVING.FROM_sum)
fraction_leaving<-(totalno_topleaving/totalno_leaving)*100
#ggplot
#the same for the leaving
#focusing on years
#ggplot the top five with the TOTAL
#GENDER
gender0809<- data0809 %>%
count(SEX, name = "total_08-09")
count(GENDER,name = "total_09-10")
#fraction0910
sumsex0910<- gender0910[1,2]+gender0910[2,2]
fractionfemale0910<- (gender0910[1,2]/sumsex0910)*100
count(STUDENT_GENDER_CDE,name = "total_12-13")
count(ParticipantGender,name = "total_13-14")

sumsex1314<-gender1314[1,2]+gender1314[2,2]
fractionsemale13314<-(gender1314[1,2]/sumsex1314)*100
#find the mode function
calculate_mode<-function(x) {
uniqx<-unique(na.omit(x))
uniqx[which.max(tabulate(match(x,uniqx)))]
}
#mode_age
calculate_mode(data0809$AGE)
calculate_mode(data1213$STUDENT_AGE_VALUE)
#mode_studylevel
calculate_mode(data0809$LEVELSTUDY)
calculate_mode(data1213$STUDENT_STUDY_LEVEL_CDE)
#subject area
area0809<- data0809 %>%
count(SUBJECTAREA,name = "total_08-09")
area0910<- data0910 %>%
area1011<- data1011 %>%

area1112<- data1011 %>%
area1213<- data1213 %>%
count(STUDENT_SUBJECT_AREA_VALUE,name = "total_12-13")
area1314<- data1314 %>%
count(SubjectAreaCode,name = "total_13-14")
sub222<-filter(data1314,SubjectAreaCode=="222")
#language taught
lang0809<- data0809 %>%
count(LANGUAGETAUGHT,name = "total_08-09")
lang0910<- data0910 %>%
lang1011<- data1011 %>%
lang1112<- data1011 %>%
lang1213<- data1213 %>%
count(LANGUAGE_TAUGHT_CDE,name = "total_12-13")
lang1314<- data1314 %>%
count(Language,name = "total_13-14")
#fraction
english0910<-lang0910[6,2]
totallang0910<-sum(lang0910$`total_09-10`)
fractionlang0910<- (english0910/totallang0910)*100

english1314<-lang1314[15,2]
totallang1314<-sum(lang1314$`total_13-14`)
fractionlang1314<- (english1314/totallang1314)*100
#PART 2
library(Hmisc)
#criminality
criminality <- read_csv("//hume/student-u02/marthoma/pc/Desktop/crimesbyreportingcountry08-
14.csv")
glimpse(crimes)
crimes<-criminality %>%
select(-c(`Flag and Footnotes`,LEG_STAT)) %>%
filter(CITIZEN=="Total",UNIT=="Number")
crimes$Value[crimes$Value==":"]<-NA
crimes<-crimes %>%
select(-c(CITIZEN,UNIT)) %>%
rename(COUNTRY=GEO) %>%
rename(CRIMINALS=Value) %>%
rename(DATE=TIME)
crimes$COUNTRY[crimes$COUNTRY=="England and Wales"]<-"United Kingdom"
crimes$COUNTRY[crimes$COUNTRY=="Scotland"]<-"United Kingdom"
crimes$COUNTRY[crimes$COUNTRY=="Northern Ireland (UK)"]<-"United Kingdom"
crimes$COUNTRY[crimes$COUNTRY=="Germany (until 1990 former territory of the FRG)"]<-
"Germany"

crimes<- crimes%>%
filter(!is.na(CRIMINALS))
rm(criminality)
#fixing the variables
crimes$DATE<-as.character(crimes$DATE)
class(crimes$CRIMINALS)
crimes$CRIMINALS<-gsub(".","",crimes$CRIMINALS)
crimes<-arrange(crimes,desc(CRIMINALS))
TOTAL$COUNTRY[TOTAL$COUNTRY=="GR"]<- "Greece"
TOTAL$COUNTRY[TOTAL$COUNTRY=="AT"]<- "Austria"
TOTAL$COUNTRY[TOTAL$COUNTRY=="BE"]<- "Belgium"
TOTAL$COUNTRY[TOTAL$COUNTRY=="BG"]<- "Burlgaria"
TOTAL$COUNTRY[TOTAL$COUNTRY=="CZ"]<- "Czechia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="CH"]<- "Switzerland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="CY"]<- "Cyprus"
TOTAL$COUNTRY[TOTAL$COUNTRY=="DE"]<- "Germany"
TOTAL$COUNTRY[TOTAL$COUNTRY=="DK"]<- "Denmark"
TOTAL$COUNTRY[TOTAL$COUNTRY=="EE"]<- "Estonia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="ES"]<- "Spain"
TOTAL$COUNTRY[TOTAL$COUNTRY=="FI"]<- "Finland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="FR"]<- "France"
TOTAL$COUNTRY[TOTAL$COUNTRY=="GB"]<- "United Kingdom"
TOTAL$COUNTRY[TOTAL$COUNTRY=="HR"]<- "Croatia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="HU"]<- "Hungary"
TOTAL$COUNTRY[TOTAL$COUNTRY=="IE"]<- "Ireland"

TOTAL$COUNTRY[TOTAL$COUNTRY=="IS"]<- "Iceland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="IT"]<- "Italy"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LT"]<- "Lithuania"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LI"]<- "Liechtenstein"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LU"]<- "Luxembourg"
TOTAL$COUNTRY[TOTAL$COUNTRY=="LV"]<- "Latvia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="MT"]<- "Malta"
TOTAL$COUNTRY[TOTAL$COUNTRY=="NL"]<- "Netherlands"
TOTAL$COUNTRY[TOTAL$COUNTRY=="NO"]<- "Norway"
TOTAL$COUNTRY[TOTAL$COUNTRY=="PL"]<- "Poland"
TOTAL$COUNTRY[TOTAL$COUNTRY=="PT"]<- "Portugal"
TOTAL$COUNTRY[TOTAL$COUNTRY=="RO"]<- "Romania"
TOTAL$COUNTRY[TOTAL$COUNTRY=="SE"]<- "Sweden"
TOTAL$COUNTRY[TOTAL$COUNTRY=="SI"]<- "Slovenia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="SK"]<- "Slovakia"
TOTAL$COUNTRY[TOTAL$COUNTRY=="TR"]<- "Turkey"
TOTAL$COUNTRY[TOTAL$COUNTRY=="UK"]<- "United Kingdom"
TOTAL$DATE<-as.character(TOTAL$DATE)
glimpse(TOTAL)
glimpse(crimes)
st_cri<-right_join(crimes,TOTAL, by=c("COUNTRY","DATE"),copy=FALSE)
glimpse(st_cri)
#PLOT TO SEE ANY POTENCIAL CONNECTION
ggplot(st_cri,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_point()
ggplot(st_cri,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_text()

#focus on top 5
st_cri_top5<- st_cri%>%
filter(COUNTRY %in% c("Spain","Germany","Italy","France","United Kingdom"))
ggplot(st_cri_going,aes(x=CRIMINALS,y=GOING.TO,label=COUNTRY)) + geom_text()
#average of values
glimpse(st_cri)
st_cri$CRIMINALS<-as.numeric(st_cri$CRIMINALS)
avgcrimes<- st_cri%>%
filter(!is.na(CRIMINALS))%>%
summarise(avg.criminals=mean(CRIMINALS))
#avgcrimes$avg.criminals<-format(round(avgcrimes$avg.criminals,2),nsmall = 2)
class(avgcrimes$avg.criminals)
avgcrimes_top<- avgcrimes%>%
filter(avg.criminals>180000)
ggplot(avgcrimes_top,aes(x=COUNTRY,y=avg.criminals))+geom_col()+theme_classic() + labs(title
="Graph 3",x="Countries (top 14)",y="Average number of suspected")
avgst<-st_cri%>%
summarise(avg.students=mean(GOING.TO))
class(avgst$avg.students)
avgst_crimes$avg.students<-format(round(avgst$avg.students,2),nsmall = 2)
avg_cri.st<- inner_join(avgcrimes,avgst,by="COUNTRY")

#scatter plot
theme_set(theme_bw())
ggplot(avg_cri.st,aes(x=avg.criminals,y=avg.students,label=COUNTRY)) +geom_text()
+labs(title="Figure 1", x="Average number of suspected people",y="Average number of receiving
students") + theme_classic()
#add limits: + xlim(c(0, 0.1)) + ylim(c(0, 500000)), line: + geom_smooth(method="loess", se=F)
#ols: geom_smooth(method="lm", se=FALSE) +
#geom_smooth(method="lm", se=FALSE) +
#theme_bw()
#correlations
#spain
st_spain<-TOTAL%>%
filter(COUNTRY=="Spain") %>%
select(-c(LEAVING.FROM))
cri_spain<-crimes%>%
filter(COUNTRY=="Spain")
cri_spain$CRIMINALS<-as.numeric(cri_spain$CRIMINALS)
class(cri_spain$CRIMINALS)
rcorr(st_spain$GOING.TO,cri_spain$CRIMINALS,type="pearson")
#france
st_fr<-TOTAL%>%
filter(COUNTRY=="France") %>%
cri_france<-crimes%>%

filter(COUNTRY=="France")
cri_france$CRIMINALS<-as.numeric(cri_france$CRIMINALS)
class(cri_france$CRIMINALS)
rcorr(st_fr$GOING.TO,cri_france$CRIMINALS,type="pearson")
#germany
st_de<-TOTAL%>%
filter(COUNTRY=="Germany") %>%
cri_de<-crimes%>%
filter(COUNTRY=="Germany")
cri_de$CRIMINALS<-as.numeric(cri_de$CRIMINALS)
class(cri_de$CRIMINALS)
rcorr(st_de$GOING.TO,cri_de$CRIMINALS,type="pearson")
#italy
st_it<-TOTAL%>%
filter(COUNTRY=="Italy") %>%
cri_it<-crimes%>%
filter(COUNTRY=="Italy")
cri_it$CRIMINALS<-as.numeric(cri_it$CRIMINALS)
class(cri_it$CRIMINALS)
rcorr(st_it$GOING.TO,cri_it$CRIMINALS,type="pearson")
#very weak correlation for all the countries

#continue to the next file: education
#07-11
eduexp07_11 <- read_csv("m:/pc/desktop/eduexp07-11.csv")
eduexp0711<- eduexp07_11%>%
select(-c(`Flag and Footnotes`,UNIT))%>%
filter(INDIC_ED=="Total public expenditure on education as % of GDP, at tertiary level of education
(ISCED 5-6)") %>%
filter(TIME !="2007")
eduexp0711$GEO[eduexp0711$GEO=="Germany (until 1990 former territory of the FRG)"]<-
"Germany"
rm(eduexp07_11)
eduexp0711$Value[eduexp0711$Value==":"]<-NA
eduexp0711<-eduexp0711%>%
select(-c(INDIC_ED)) %>%
filter(!is.na(Value))
glimpse(eduexp0711)
#changing to numeric
class(eduexp0711$Value)
eduexp0711$Value<-gsub(",","",eduexp0711$Value)
eduexp0711$Value<-as.numeric(eduexp0711$Value)
eduexp0711$Value<-(eduexp0711$Value)/100

#12-14
educ_uoe_fine06_1_Data <- read_csv("M:/pc/Desktop/educ_uoe_fine06_1_Data.csv",
col_types = cols(TIME = col_character()))
eduexp1214<- educ_uoe_fine06_1_Data%>%
select(-c(`Flag and Footnotes`,UNIT,ISCED11)) %>%
filter(GEO !="European Union - 28 countries")
eduexp1214$GEO[eduexp1214$GEO=="Germany (until 1990 former territory of the FRG)"]<-
"Germany"
eduexp1214$Value[eduexp1214$Value==":"]<-NA
rm(educ_uoe_fine06_1_Data)
eduexp1214<-eduexp1214%>%
filter(!is.na(Value))
glimpse(eduexp1214)
eduexp1214$TIME<-as.numeric(eduexp1214$TIME)
class(eduexp1214$TIME)
#merge

eduexp<-bind_rows(eduexp0711,eduexp1214)
glimpse(eduexp)
eduexp$DATE<-as.character(eduexp$DATE)
eduexp<-eduexp%>%
rename(DATE=TIME)%>%
rename(COUNTRY=GEO) %>%
rename(public.exp=Value)
#merge
st_exp<-inner_join(eduexp,TOTAL, by=c("COUNTRY","DATE"),copy=FALSE)
glimpse(st_exp)
st_exp<-st_exp%>%
select(-c(LEAVING.FROM)) %>%
filter(!is.na(public.exp))
#find avg
avgexp<-st_exp%>%
group_by(COUNTRY)%>%
summarise(avgpublic.exp=mean(public.exp))
class(avgexp$avgpublic.exp)
avgexp$avgpublic.exp<-format(round(avgexp$avgpublic.exp,3),nsmall = 3)
glimpse(avgst)
#merge
avg_exp.st<- inner_join(avgexp,avgst,by="COUNTRY")
class(avg_exp.st$avgpublic.exp)

avg_exp.st$avgpublic.exp<-as.numeric(avg_exp.st$avgpublic.exp)
#plot
ggplot(avg_exp.st,aes(x=avgpublic.exp,y=avg.students,label=COUNTRY)) +geom_text()
+labs(title="Figure 2", x="Average percentage of public expenditure",y="Average number of
receiving students") + theme_classic()
#see time plot for the top 3, if betters or not
spain_exp<-st_exp%>%
filter(COUNTRY=="Spain")
ggplot(spain_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE) +labs(title="Figure 3: Spain 2008-2014", x="Percentage of public expenditure",y="Average
number of receiving students") + theme_classic()
#germany
de_exp<-st_exp%>%
filter(COUNTRY=="Germany")
ggplot(de_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 5: Germany 2008-2014", x="Percentage of public
expenditure",y="Average number of receiving students") + theme_classic()
#france
fr_exp<-st_exp%>%
filter(COUNTRY=="France")
ggplot(fr_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 4: France 2008-2014", x="Percentage of public
#italy

it_exp<-st_exp%>%
filter(COUNTRY=="Italy")
ggplot(it_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 6: Italy 2008-2014", x="Percentage of public expenditure",y="Average
number of receiving students") + theme_classic()
#uk
uk_exp<-st_exp%>%
filter(COUNTRY=="United Kingdom")
ggplot(uk_exp,aes(x=public.exp,y=GOING.TO)) +geom_point()+ geom_smooth(method="lm",
se=FALSE)+labs(title="Figure 7: England 2008-2014", x="Percentage of public

University Student’s mobility under the ERASMUS program in European Union, 2008-2014.

Recommended

Recommended

More Related Content

Similar to University Student’s mobility under the ERASMUS program in European Union, 2008-2014.

Similar to University Student’s mobility under the ERASMUS program in European Union, 2008-2014. (20)

Recently uploaded

Recently uploaded (20)

University Student’s mobility under the ERASMUS program in European Union, 2008-2014.