The document discusses preparing GDP data from an Excel file for analysis in R. It loads the Excel file into a dataframe called gdp using read_excel(). It then cleans the data, removing missing value rows and duplicating column names. The column names are renamed and the data is saved to an RData file for later use. Key steps include reading the Excel file, cleaning the data, renaming variables, and saving the final dataframe.
Leaflet is an open source JavaScript library used to build web mapping applications. Capital MetroRail is a hybrid rail system that serves the Greater Austin area. In this talk we'll use Leaflet JS to show the 32-mile route of Capital MetroRail, where it's stops are and we'll use Leaftlet JS to provide real time tracking of the all the active rail cars on the network. We'll then use Leaflet JS to show the full 154-miles of train track that that Capital Metro owns (despite not operating trains on all of it), the counties that those 154-miles of track run through and what the route looked like in the 1950's as captured by Sanborn Fire Insurance Maps.
Our journey will take us from client-side Javascript and server-side PHP to the CLI and the database.
(KO) <<2019 데이터야놀자>> 에서 발표한 내용입니다.
사회적으로 문제가 되고 있는 어뷰징 분석과 이를 방지할 수 있는 방법에 대해 정리해보았습니다.
(EN) Presented in Datayanolja 2019 (Domestic data conference).
Dealing with one of the big social issues: manipulation of public opinions in online news platform.
Main contributions are two-fold: 1) identifying manipulators 2) suggesting possible 3 solutions
* notice: The material is written in Korean.
Leaflet is an open source JavaScript library used to build web mapping applications. Capital MetroRail is a hybrid rail system that serves the Greater Austin area. In this talk we'll use Leaflet JS to show the 32-mile route of Capital MetroRail, where it's stops are and we'll use Leaftlet JS to provide real time tracking of the all the active rail cars on the network. We'll then use Leaflet JS to show the full 154-miles of train track that that Capital Metro owns (despite not operating trains on all of it), the counties that those 154-miles of track run through and what the route looked like in the 1950's as captured by Sanborn Fire Insurance Maps.
Our journey will take us from client-side Javascript and server-side PHP to the CLI and the database.
(KO) <<2019 데이터야놀자>> 에서 발표한 내용입니다.
사회적으로 문제가 되고 있는 어뷰징 분석과 이를 방지할 수 있는 방법에 대해 정리해보았습니다.
(EN) Presented in Datayanolja 2019 (Domestic data conference).
Dealing with one of the big social issues: manipulation of public opinions in online news platform.
Main contributions are two-fold: 1) identifying manipulators 2) suggesting possible 3 solutions
* notice: The material is written in Korean.
Learn to write readable code with pipes using the magrittr package. You will learn about the forward operator (%>%), exposition operator (%$%) and the assignment operator (%<>%).
R is a fun and versatile language for statistical analysis, visualization, and data exploration. Target audience are software engineers/programmers who can code comfortably in another language. Emphasis in this lesson is on data structures, and light on analysis examples (to be covered at later date) but you are exposed to the basic concepts and commands. Email me for the pptx file which has notes.
Credit Risk Assessment using Machine Learning Techniques with WEKAMehnaz Newaz
The goal of this projects is to obtain a machine learning model to perform credit scoring. Credit risk
assessment of an applicant is vital to the banking sector. There are 20 attributes used in judging a loan applicant( 7 numerical, 13 categorical/nominal). The goal is to classify the applicant in one of two categories. Good or Bad. A few Algorithms were tested out for accuracy and the best model was chosen to predict the data which was was analyzed to see how well the model predicted.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. 5
1
• Excel, CSV, Stata JSON, XML
•
•
• R
2
• Excel : Microsoft Excel .xls .xlsx
• CSV : Comma Separated Values
• Stata : Stata Stata
• JSON: JavaScript
JavaScript
• XML: XML Extensible Markup Language
Web HTML Hyper Text Markup
Language XML
R Excel CSV Stata 3
3 Excel
GDP Excel GDP R
I 2
3. 3.1 GDP 5
3.1 GDP
GDP
(1) =⇒ =⇒
(2) 23 2008SNA 2018 30
2011 2008SNA
I 3
4. 3.2 Excel 5
(3) IV. 1
30ffm1rn_jp.xlsx
3.2 Excel
R R
readxl Excel
readxl
> install.packages("readxl")
readxl
> library(readxl)
Excel read_excel()
> help(read_excel)
3.3 Excel
Excel Excel 3 read_excel()
1
1 read_excel()
Excel
I 4
5. 3.3 Excel 5
> gdp <- read_excel(" .xls") #
3.3.1
R R working
directory R getwd()
> getwd() #
[1] "/Users/shito/Documents/git-repositories/R-programming-lecnote/handout"
setwd()
> setwd("/Users/shito") # /Users/shito
Mac Windows E seminer
> setwd("E:/seminer")
30ffm1rn_jp.xlsx Windows E
> gdp <- read_excel("E:/ /30ffm1rn_jp.xlsx") # Windows E
30ffm1rn jp.xlsx data
> gdp <- read_excel("data/30ffm1rn_jp.xlsx")
3.3.2
Excel 8 7
6
skip=6 1 col_names TRUE
col_names TRUE
I 5
6. 3.3 Excel 5
> gdp <- read_excel("data/30ffm1rn_jp.xlsx", skip=6, col_names=TRUE)
> dim(gdp) #
[1] 62 26
gdp 1–3
1–4
> gdp[1:3, 1:4]
# A tibble: 3 x 4
...1 `1994` `1995` `1996`
<chr> <dbl> <dbl> <dbl>
1 245684. 251970. 258152.
2 241526. 247606. 253738.
3 238193. 243885. 250301.
A tibble: 3 x 4 gdp Tibble
Tibble
> is.data.frame(gdp) #
[1] TRUE
> gdp <- as.data.frame(gdp) # as.
Excel R
str()
> str(gdp)
'data.frame': 62 obs. of 26 variables:
$ ...1: chr " " " " " "
$ 1994: num 245684 241526 238193 3841 220 ...
$ 1995: num 251970 247606 243885 4436 224 ...
$ 1996: num 258152 253738 250301 4083 326 ...
$ 1997: num 255782 251424 248477 3457 349 ...
$ 1998: num 256658 251557 248824 3160 328 ...
$ 1999: num 260436 254945 251806 3532 264 ...
$ 2000: num 263972 259136 256157 3314 262 ...
$ 2001: num 268881 263698 261270 2602 268 ...
$ 2002: num 271953 266957 264282 2995 357 ...
$ 2003: num 273850 268450 266218 2814 665 ...
$ 2004: num 277097 271716 269038 3435 815 ...
$ 2005: num 281427 275868 273860 2808 873 ...
$ 2006: num 283494 277848 276444 2092 726 ...
$ 2007: num 285850 280312 279124 1977 815 ...
$ 2008: num 280055 274560 273435 1898 789 ...
$ 2009: num 282488 276710 275792 1752 826 ...
I 6
7. 3.4 5
$ 2010: num 286647 280524 279478 1983 942 ...
$ 2011: num 288797 282050 280876 1864 690 ...
$ 2012: num 293397 286118 285226 1846 954 ...
$ 2013: num 301514 294138 294092 1360 1290 ...
$ 2014: num 293681 286783 287605 1162 1882 ...
$ 2015: num 295661 288039 289877 1088 2767 ...
$ 2016: num 295534 287605 289346 1280 2954 ...
$ 2017: num 298875 290958 293426 1194 3534 ...
$ 2018: num 299047 291331 294214 1288 4058 ...
1 62 obs. of 26 variables 62 26
$ 1
...1 character
2 1994 numeric
Excel 1 1 2
1994 2018
3.4
1 gdp 1
rownames()
> rownames(gdp) <- gdp[, 1] # <--
’row.names’
b Excel
I 7
8. 3.4 5
1 Excel
NA read_excel()
trim=TRUE GDP
> gdp[,1] # Excel NA
3.4.1
NA
> gdp <- gdp[!is.na(gdp[,1]),] # NA
> dim(gdp) # 62 57
[1] 57 26
Excel 3 gdp 2 NA
3
> gdp <- gdp[1:(dim(gdp)[1]-3), ] # 3
> dim(gdp) # 57 54
[1] 54 26
3 1:(dim(gdp)[1]-3)
I 8
12. 5
4
R
ls()
> ls() # list
[1] "cn" "gdp" "n"
save() load()
.RData
> save(gdp, file="ch05-GDP.RData") # gdp ch05-GDP.RData
> save(gdp, cn, file="ch05-GDP.RData") # gdp cn
5
load()
gdp cn rm() remove
> rm(gdp, cn) # gdp cn remove
> ls() # gdp cn
[1] "n"
load()
> load("ch05-GDP.RData")
> ls()
[1] "cn" "gdp" "n"
> gdp[1:3, 1:4]
Y1994 Y1995 Y1996 Y1997
245683.7 251970.1 258151.7 255781.6
241525.9 247606.3 253738.1 251423.5
238192.7 243885.2 250301.4 248476.6
I 12
13. 5
6 —GDP
R GDP
GDP GDI GDE
1
37 40
> rownames(gdp)[c(37, 40)]
[1] " " " "
2015
> gdp[c(" ", " "), "Y2015"]
[1] 517223.3 524004.4
Excel
3
GDE
GDP GDP
> max(gdp[" ",]) # GDP
[1] 533667.9
> min(gdp[" ",]) # GDP
[1] 426889.1
Excel 10 100 GDP
GDP
GDP
y
> y <- gdp[" ",]
> colnames(gdp)[y == max(y)] # y y
[1] "Y2018"
> colnames(gdp)[y == min(y)] # y y
[1] "Y1994"
GDP 2018 1994
GDP
1 I
I 13
14. 5
> plot(1994:2018, y, type="l", xlab="Year", ylab="GDP")
1995 2000 2005 2010 2015
440000460000480000500000520000
Year
GDP
plot() plot(x, y) x y
x 1994 2018 1994:2018
y GDP y
2 type="l" line
xlab ylab x y
GDP 4
GDP 2009 2008
7 CSV
CSV Comma Separated Values CSV
CSV
5
I 14
17. 5
head()
tail()
> head(microdata) #
> tail(microdata) #
RData
> save(microdata, file="Microdata.RData") #
8 Stata
Stata
Stata
.dta Stata
Stata R
Stata read.dta() foreign
> library(foreign)
> library(help=foreign) # foreign
> help(read.dta) # read.dta
Stata read.dta()
> stata.data <- read.dta(" .dta") # Stata dta
9
GDP GDP R
GDP GDP GDP
10
10 GDP
Purchasing Power Parity Rate PPP
I 17
18. 5
PPP GDP
GDP GDP
GDP GDP per capita
(1) World Bank GDP ppp per capita
http://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD
(2) Download CSV
(3) API_NY API_NY.GDP.PCAP.PP.CD_DS2_en_csv_v2_######.csv
######
R ppp
.
> ppp <- read.csv("data/API_NY/API_NY.GDP.PCAP.PP.CD_DS2_en_csv_v2.csv", skip=4)
(4)
. 1990 2018
> dim(ppp)
[1] 264 65
I 18
19. 5
(5) ppp 1 1
.
> rownames(ppp) <- ppp$Country.Name
> ppp <- ppp[, -1]
> ppp[1:4, 1:4]
Country.Code Indicator.Name
Aruba ABW GDP per capita, PPP (current international $)
Afghanistan AFG GDP per capita, PPP (current international $)
Angola AGO GDP per capita, PPP (current international $)
Albania ALB GDP per capita, PPP (current international $)
Indicator.Code X1960
Aruba NY.GDP.PCAP.PP.CD NA
Afghanistan NY.GDP.PCAP.PP.CD NA
Angola NY.GDP.PCAP.PP.CD NA
Albania NY.GDP.PCAP.PP.CD NA
(6) Indicator.Name Indicator.Code
.
> ppp <- ppp[, (colnames(ppp) != "Indicator.Name")
+ & (colnames(ppp) != "Indicator.Code")]
> dim(ppp)
[1] 264 62
(7) 2015 GDP
. ppp$X2015 NA == NA
> ppp$X2015==max(ppp$X2015, na.rm=T)
[1] FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE
[14] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA
[27] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA FALSE FALSE
[40] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE
[53] FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[66] FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA FALSE
[79] FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE NA FALSE
[92] NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[105] FALSE FALSE NA FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[118] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[131] FALSE FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[144] FALSE FALSE NA FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
I 19
20. 5
[157] FALSE FALSE FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE
[170] FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[183] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE
[196] FALSE FALSE NA TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[209] FALSE FALSE FALSE NA FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE
[222] FALSE FALSE FALSE FALSE NA FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[235] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[248] FALSE FALSE FALSE FALSE FALSE NA NA NA FALSE FALSE FALSE FALSE FALSE
[261] FALSE FALSE FALSE FALSE
NA NA
TRUE NA
> rownames(ppp)[ppp$X2015==max(ppp$X2015, na.rm=T)]
[1] NA NA NA NA NA NA NA NA NA NA
[11] NA NA NA NA NA NA NA NA NA NA
[21] "Qatar" NA NA NA NA NA NA
TRUE which()
> rownames(ppp)[which(ppp$X2015==max(ppp$X2015, na.rm=T))] # which
[1] "Qatar"
cat()
> japan <- ppp["Japan", "X2015"]
> rich <- max(ppp$X2015, na.rm=T)
> poor <- min(ppp$X2015, na.rm=T)
> rich.country <- rownames(ppp)[which(ppp$X2015==rich)]
> poor.country <- rownames(ppp)[which(ppp$X2015==poor)]
> cat(rich.country, rich, "n",
+ "Japan", japan, "n",
+ poor.country, poor, "n")
Qatar 123822.1
Japan 40396.24
Central African Republic 744.7345
(8) GDP GDP
.
> rich/japan
[1] 3.065188
> poor/japan
I 20