Desk reference for data transformation in Stata. Co-authored with Tim Essam (@StataRGIS, linkedin.com/in/timessam). See all cheat sheets at http://bit.ly/statacheatsheets. Updated 2016/06/03.
Desk reference for data transformation in Stata. Co-authored with Tim Essam (@StataRGIS, linkedin.com/in/timessam). See all cheat sheets at http://bit.ly/statacheatsheets. Updated 2016/06/03.
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
This is an interactive introduction to R.
R is an open source language for statistical computing, data analysis, and graphical visualization.
While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in industry as well – both Facebook and Google use R within their firms.
Stata cheat sheet: programming. Co-authored with Tim Essam (linkedin.com/in/timessam). See all cheat sheets at http://bit.ly/statacheatsheets. Updated 2016/06/04
Desk reference for data wrangling, analysis, visualization, and programming in Stata. Co-authored with Tim Essam(@StataRGIS, linkedin.com/in/timessam). See all cheat sheets at http://bit.ly/statacheatsheets. Updated 2016/06/03
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
This presentation provides a brief introduction to data types and objects in R. I've not covered 'array' in the presentation, which is a multi-dimensional object [More general than matrix].
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
This is an interactive introduction to R.
R is an open source language for statistical computing, data analysis, and graphical visualization.
While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in industry as well – both Facebook and Google use R within their firms.
Stata cheat sheet: programming. Co-authored with Tim Essam (linkedin.com/in/timessam). See all cheat sheets at http://bit.ly/statacheatsheets. Updated 2016/06/04
Desk reference for data wrangling, analysis, visualization, and programming in Stata. Co-authored with Tim Essam(@StataRGIS, linkedin.com/in/timessam). See all cheat sheets at http://bit.ly/statacheatsheets. Updated 2016/06/03
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
This presentation provides a brief introduction to data types and objects in R. I've not covered 'array' in the presentation, which is a multi-dimensional object [More general than matrix].
statistical computation using R- an intro..Kamarudheen KV
This presentation deals with some basics of R language. It is very useful for benners in R. It describes the basics in a very easy manner, so those who are not familiar with R it would be very helpful.
A high level introduction to R statistical programming language that was presented at the Chicago Data Visualization Group's Graphing in R and ggplot2 workshop on October 8, 2012.
As part of the GSP’s capacity development and improvement programme, FAO/GSP have organised a one week training in Izmir, Turkey. The main goal of the training was to increase the capacity of Turkey on digital soil mapping, new approaches on data collection, data processing and modelling of soil organic carbon. This 5 day training is titled ‘’Training on Digital Soil Organic Carbon Mapping’’ was held in IARTC - International Agricultural Research and Education Center in Menemen, Izmir on 20-25 August, 2017.
The Array is the most commonly used Data Structure.
An array is a collection of data elements that are of the same type (e.g., a collection of integers, collection of characters, collection of doubles).
OR
Array is a data structure that represents a collection of the same types of data.
The values held in an array are called array elements
An array stores multiple values of the same type – the element type
The element type can be a primitive type or an object reference
Therefore, we can create an array of integers, an array of characters, an array of String objects, an array of Coin objects, etc.
1) How to use R?
2) What is the difference between Excel and R
3) How to assign variables
4) How to read/write csv
5) How to manipulate string / dataframe
6) Basic statistics (Mean, Median, Max, ...)
7) Function
As part of the GSP’s capacity development and improvement programme, FAO/GSP have organised a one week training in Izmir, Turkey. The main goal of the training was to increase the capacity of Turkey on digital soil mapping, new approaches on data collection, data processing and modelling of soil organic carbon. This 5 day training is titled ‘’Training on Digital Soil Organic Carbon Mapping’’ was held in IARTC - International Agricultural Research and Education Center in Menemen, Izmir on 20-25 August, 2017.
Russian anarchist and anti-war movement in the third year of full-scale warAntti Rautiainen
Anarchist group ANA Regensburg hosted my online-presentation on 16th of May 2024, in which I discussed tactics of anti-war activism in Russia, and reasons why the anti-war movement has not been able to make an impact to change the course of events yet. Cases of anarchists repressed for anti-war activities are presented, as well as strategies of support for political prisoners, and modest successes in supporting their struggles.
Thumbnail picture is by MediaZona, you may read their report on anti-war arson attacks in Russia here: https://en.zona.media/article/2022/10/13/burn-map
Links:
Autonomous Action
http://Avtonom.org
Anarchist Black Cross Moscow
http://Avtonom.org/abc
Solidarity Zone
https://t.me/solidarity_zone
Memorial
https://memopzk.org/, https://t.me/pzk_memorial
OVD-Info
https://en.ovdinfo.org/antiwar-ovd-info-guide
RosUznik
https://rosuznik.org/
Uznik Online
http://uznikonline.tilda.ws/
Russian Reader
https://therussianreader.com/
ABC Irkutsk
https://abc38.noblogs.org/
Send mail to prisoners from abroad:
http://Prisonmail.online
YouTube: https://youtu.be/c5nSOdU48O8
Spotify: https://podcasters.spotify.com/pod/show/libertarianlifecoach/episodes/Russian-anarchist-and-anti-war-movement-in-the-third-year-of-full-scale-war-e2k8ai4
ZGB - The Role of Generative AI in Government transformation.pdfSaeed Al Dhaheri
This keynote was presented during the the 7th edition of the UAE Hackathon 2024. It highlights the role of AI and Generative AI in addressing government transformation to achieve zero government bureaucracy
Jennifer Schaus and Associates hosts a complimentary webinar series on The FAR in 2024. Join the webinars on Wednesdays and Fridays at noon, eastern.
Recordings are on YouTube and the company website.
https://www.youtube.com/@jenniferschaus/videos
A process server is a authorized person for delivering legal documents, such as summons, complaints, subpoenas, and other court papers, to peoples involved in legal proceedings.
What is the point of small housing associations.pptxPaul Smith
Given the small scale of housing associations and their relative high cost per home what is the point of them and how do we justify their continued existance
Many ways to support street children.pptxSERUDS INDIA
By raising awareness, providing support, advocating for change, and offering assistance to children in need, individuals can play a crucial role in improving the lives of street children and helping them realize their full potential
Donate Us
https://serudsindia.org/how-individuals-can-support-street-children-in-india/
#donatefororphan, #donateforhomelesschildren, #childeducation, #ngochildeducation, #donateforeducation, #donationforchildeducation, #sponsorforpoorchild, #sponsororphanage #sponsororphanchild, #donation, #education, #charity, #educationforchild, #seruds, #kurnool, #joyhome
5. One India: District Level Railway Passenger Flow
APC
AR
AS
BR CG DL
GA
GJ
HRHP
JK
JH
KA
KL
MP
MH
MN
MG
MZ
NA
OR PB
RJ
SK
TN
TR
UP
UK
WB
BJ
GD
NM
SH
XZ
0
5
10
15
7 8 9 10 11
Real GDP per capita in PPP (log) in 2004
AverageGrowthRateofRealGDPpercapita(%)
China India World
6. One India: District Level Railway Passenger Flow
APC
AR
AS
BR
CG
DL
GA
GJ HR
HP
JKJH
KA
KL
MP
MH
MN
MG
NA
OR
PB
RJ
SK
TN
TR
UP
UK
WB
BJ
GD
GZ
NM
SH
XZ
0
5
10
6 7 8 9 10
Real GDP per capita in PPP (log) in 1994
AverageGrowthRateofRealGDPPerCapita(%)
China India World
15. Components of R language – R environment (Objects and
Symbols)
Objects:
All R code manipulates objects
Examples of objects in R include
Numeric vectors
character vectors
Lists
Functions
Symbols:
Formally, variable names in R are called symbols
When you assign an object to a variable name, you are actually assigning the object to a symbol in the current environment
R environment:
An environment is defined as the set of symbols that are defined in a certain context
For example, the statement:
> x <- 1
assigns the symbol “x” to the object “1” in the current environment
16. Components of R language - Expressions
R code is composed of a series of expressions
Examples of expressions in R include
assignment statements
conditional statements
arithmetic expressions
Expressions are composed of objects and functions
You may separate expressions with new lines or with semicolons
Example :
Using semicolons
"this expression will be printed"; 7 + 13; exp(0+1i*pi)
Using new lines
"this expression will be printed“
7 + 13
exp(0+1i*pi)
18. Basic Operations in R
R has a wide variety of data structures, we will look at few basic ones
Vectors (numerical, character, logical)
Matrices
Data frames
Lists
Your first Operations in R
When you enter an expression into the R console and press the Enter key, R will evaluate that expression and display
the results
The interactive R interpreter will automatically print an object returned by an expression entered into the R console
> 1 + 2 + 3
[1] 6
In R, any number that you enter in the console is interpreted as a vector
19. Variables in R
R lets you assign values to variables and refer to them by name.
In R, the assignment operator is <-. Usually, this is pronounced as “gets.”
The statement: x <- 1 is usually read as “x gets 1.”
There are two additional operators that can be used for assigning values to symbols.
First, you can use a single equals sign (“=”) for assignment
you can also assign an object on the left to a symbol on the right:
> 3 -> three
Whichever notation you prefer,
Be careful because the = operator does not mean “equals.” For that, you need to use the ==
operator
Note that you cannot use the <- operator when passing arguments to a function; you need to map values to argument names
using the “=” symbol.
20. What is a Vector in R??
A vector is an ordered collection of same data type
The “[1]” means that the index of the first item displayed in the row is 1
You can construct longer vectors using the c(...) function. (c stands for “combine.”)
> c(0, 1, 1, 2, 3, 5, 8)
[1] 0 1 1 2 3 5 8
> 1:50
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50
The numbers in the brackets on the left hand side of the results indicate the index of the first element shown in each row
When you perform an operation on two vectors, R will match the elements of the two vectors pair wise and return a vector
> c(1, 2, 3, 4) + c(10, 20, 30, 40)
[1] 11 22 33 44
If the two vectors aren’t the same size, R will repeat the smaller sequence multiple times:
> c(1, 2, 3, 4, 5) + c(10, 100)
[1] 11 102 13 104 15
Warning message:
In c(1, 2, 3, 4, 5) + c(10, 100) :
longer object length is not a multiple of shorter object length
21. Arrays
An array is a multidimensional vector.
Vectors and arrays are stored the same way internally, but an array may be displayed differently and accessed differently.
An array object is just a vector that’s associated with a dimension attribute.
Let’s define an array explicitly
>a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))
> a
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Here is how you reference one cell
a[2,2]
[1] 5
Arrays can have more than two dimensions.
> w <- array(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),dim=c(3,3,2))
> w
22. Arrays & Matrix
R uses very clean syntax for referring to part of an array. You specify separate indices for each dimension, separated by
commas
> w[1,1,1]
[1] 1
To get all rows (or columns) from a dimension, simply omit the indices
> # first row only
> a[1,]
[1] 1 4 7 10
> # first column only
> a[,1]
[1] 1 2 3
A matrix is just a two-dimensional array
> m <- matrix(data=c(1,2,3,4,5,6,7,8,9,10,11,12),nrow=3,ncol=4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
23. Data Frames
A data frame is a list that contains multiple named vectors of same length
A data frame is a lot like a spreadsheet or a database table
Data frames are particularly good for representing data
Let’s construct a data frame with the win/loss results in the National League
> teams <- c("PHI","NYM","FLA","ATL","WSN")
> w <- c(92, 89, 94, 72, 59)
> l <- c(70, 73, 77, 90, 102)
> nleast <- data.frame(teams,w,l)
> nleast
teams w l
1 PHI 92 70
2 NYM 89 73
3 FLA 94 77
4 ATL 72 90
5 WSN 59 102
You can refer to the components of a data frame (or items in a list) by name using the $ operator
>nleast$ teams
24. Lists
It’s possible to construct more complicated structures with multiple data types.
R has a built-in data type for mixing objects of different types, called lists.
Lists in R may contain a heterogeneous selection of objects.
You can name each component in a list.
Items in a list may be referred to by either location or name.
Creating your first list
> e <- list(thing="hat", size="8.25")
> e
You can access an item in the list in multiple ways
Using the name with help of $ operator
> e$thing
Using the location as index
> e[1]
A list can even contain other lists
25. Revision: Data Structures
Some of the data types are:
• Factor: Categorical variable
• Vector
• Matrix
• Data Frame
• List
To identify the data type of an object we us the function class
> library(datasets)
> air <- airquality
> class(air)
> [1] "data.frame"
Data Types
26. Data Types
To check whether the object/variable is of a certain type, use is. functions
is.numeric(), is.character(), is.vector(), is.matrix(), is.data.frame()
These are Logical functions
Returns TRUE/FALSE values
To convert an object/variable of a certain type to another, use as. functions
as.numeric(), as.character(), as.vector(), as.matrix(), as.data.frame(),
as.factor(), as.list()
> is.numeric(airquality$Ozone)
> [1] TRUE
> airquality$Ozone <- as.character(airquality$Ozone)
> is.numeric(airquality$Ozone)
[1] FALSE
> is.character(airquality$Ozone)
> [1] TRUE
27. Saving, Loading, and Editing Data
Create a few vectors
> salary <- c(18700000,14626720,14137500,13980000,12916666)
> position <- c("QB","QB","DE","QB","QB")
> team <- c("Colts","Patriots","Panthers","Bengals","Giants")
> name.last <- c("Manning","Brady","Pepper","Palmer","Manning")
> name.first <- c("Peyton","Tom","Julius","Carson","Eli")
Use the data.frame function to combine the vectors
> top.5.salaries <- data.frame(name.last,name.first,team,position,salary)
top.5.salaries
R allows you to save and load R data objects to external files
The simplest way to save an object is with the save function
> save(top.5.salaries, file="C:/Documents and Settings/me/My Documents/top.5.salaries.Rdata")
Note that the file argument must be explicitly named
In R, file paths are always specified with forward slashes (“/”), even on Microsoft Windows and then assigns the result to the
same symbol in the calling environment
You can easily load this object back into R with the load function
> load("C:/Documents and Settings/me/My Documents/top.5.salaries.Rdata")
28. Importing Data into R
read.csv
To read comma separated values into R
SYNTAX: read.csv(filepath)
Sample (social sector schemes file)
read.xlsx
To read data from Excel sheets into R
Requires library “xlsx”
SYNTAX: read.xlsx(filepath, sheetName=)
Tricky to use in case of Java version mismatch
read.dta
To read data from Stata files into R
Requires library “foreign”
SYNTAX: read.dta(filepath)
read.table
To read data from tables
A generic version of all the other formats mentioned above
SYNTAX: read.table(filepath)
29. Working Directory: Truncated Filepaths
For reading files easily, one way is to specify working directory
Usual way:
file <- read.csv(“/Users/parthkhare/Documents/dataframe.csv”)
Truncated way:
getwd()
setwd(“/Users/parthkhare/Documents/”)
file<- read.csv(“dataframe.csv”)
Cheat way:
file<- read.csv(file.choose())
30. R Packages
A package is a related set of functions, help files, and data files that have been bundled together
Typically, all of the functions in the package are related:
R offers an enormous number of packages:
Some of these packages are included with R, To get the list of packages loaded by default use the following commands,
>getOption("defaultPackages") # This command omits the base package
> (.packages())
To show all packages available
> (.packages(all.available=TRUE))
> library() #new window will pop up showing you the set of available packages
Installing R package
> install.packages(c("tree","maptree"))
#This will install the packages to the default library specified by the variable .Library
Loading Packages
> library(rpart)
Removing Packages
> remove.packages(c("tree", "maptree"),.Library)
# You need to specify the library where the packages were installed
31. Getting Help
R includes a help system to help you get information about installed packages
To get help on a function, say glm()
> help(glm)
or, equivalently:
> ?glm
The following can be very helpful if you can’t remember the name of a function; R will return a list of relevant topics
> ??regression
33. Names, Renaming
Syntax : names(dataset)
> names(airquality)
1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
> names(airquality) <- NULL
> names(airquality)
> NULL
Renaming
In the following example we will change the variable name “Ozone” to”Oz”
> names(airquality) <- org.names
> names(airquality)[names(airquality)=="Ozone"]= "Oz"
[1] "Oz" "Solar.R" "Wind" "Temp" "Month" "Day"
#Renaming the second variable in data frame “airquality” to “NewName”
> names(airquality)[2] = "Sol"
> names(airquality)
[1] "Oz" "Sol" "Wind" "Temp" "Month" "Day"
34. Drop/Keep Variables
Selecting (Keeping) Variables
• # select variables “Ozone “ and “Temp”
> names(airquality) <- org.names
> keep.airquality <- airquality[c("Ozone", “Temp")]
# select 1st and 3rd through 5th variables
> keep.airquality_1 <- airquality[c(1,3:5)]
Excluding (DROPPING) Variables
• Dropping a variable from the dataset can be done by prefixing a “-” sign
before the variable name or the variable index in the Dataframe.
> drop.airquality <- airquality[,c(-3, -4)]
35. Subsetting datasets
Subseting is done by using subset function
#subsetting the data set “airquality” where Temperature is greater than 80
> subset_1 <- subset(airquality, Temp>80)
#subsetting the data set “airquality” where Temperature is greater than 80 and finally get only the “Day”
column
> subset_2 = subset(airquality, Temp>80, select=c(“Day"))
#subsetting a column where Temperature is greater than 80 and Day is equal to 8, notice the “==”
> subset_3 = subset(airquality, Temp<80& Day==8)
#subsetting rows without using “subset” function, notice the [ ] square brackets
> subset_4 = airquality[airquality$Temp==80, ]
#We use the %in% notation when we want to subset rows on multiple values of a variable
> subset_5 = airquality[airquality$Temp %in% c(70,90), ]
> subset_5.1 = airquality[airquality$Temp %in% c(70:90), ]
36. Appending
Appending two datasets require that both have exactly the same number
of variables with exactly the same name. If using categorical data make
sure the categories on both datasets refer to exactly the same thing (i.e.
1 “Agree”, 2”Disagree”).
If datasets do not have the same number of variables you can either drop
or create them so both match.
rbind /smartbind (gtools package) function is used for appending the two
dataframes.
> headair <- head(airquality)
> tailair <- tail(airquality)
> append <- rbind(headair,tailair)
> smartappend <- smartbind(headair,tailair)
37. Sorting
To sort a data frame in R, use the order( ) function. By default, sorting is
ASCENDING. Prepend the sorting variable by a minus sign to indicate
DESCENDING order. Here are some examples.
sorting examples using the mtcars dataset
attach(mtcars)
# sort by hp in ascending order
> sort.mtcars<-mtcars[order(mtcars$hp),]
# sort by hp in discending order
> sort.mtcars<-mtcars[order(-mtcars$hp),]
#Multi level sort a dataset by columns in descending order, put a “-” sign,
> sort.mtcars<-mtcars[order(vs, -mtcars$hp),]
38. Remove Duplicate Values
Duplicates are identified using “duplicated” function
#To remove duplicate rows by 2nd column from airquality
> dupair1 = airquality[!duplicated(airquality[,c(2)]),]
#To get duplicate rows in another dataset just remove the “!” sign
> dupair2 = airquality[duplicated(airquality[,c(2)]),]
39. Merging 2 datasets
Merging two datasets require that both have at least one variable in common
(either string or numeric). If string make sure the categories have the same
spelling (i.e. country names, etc.).
Merge merges only common cases to both datasets . Adding the option “all=TRUE”
includes all cases from both datasets.
To merge two data frames (datasets) horizontally, use the merge function. In most
cases, you join two data frames by one or more common key variables (i.e., an
inner join).
• # merge two data frames by ID
total <- merge(data frameA,data frameB,by="ID")
Different possible cases while merging data
• a full outer join (all records from both tables) can be created with the "all"
keyword:
e.g. merge(d1,d2,all=TRUE)
• a left outer join of two dataset can be created with all.x:
e.g. merge(d1,d2,all.x=TRUE)
• a right outer join of two dataset can be created with all.y:
e.g. merge(d1,d2,all.y=TRUE)
40. Date functions
Dates are represented as the number of days since 1970-01-01,with negative values for earlier date.
Sys.date() returns today’s date
Date()returns the current date and time
Date conversion : use as.date() to convert any string format to date format
Syntax:as.date(x,format=“ “,tz=..)
Arguments:
x:an object to be converted
format: A character string. If not specified ,it will try “%Y-%m-%d” then “%Y/%m/%d” on the first non-NA
element and give an error if neither works
tz: a timezone name
The following symbols can be used with the format( ) function to print dates
Symbol Meaning Example
%d day as a number (0-31) 01-31
%a
%A
abbreviated weekday
unabbreviated weekday
Mon
Monday
%m month (00-12) 00-12
%b
%B
abbreviated month
unabbreviated month
Jan
January
%y
%Y
2-digit year
4-digit year
07
2007
41. Useful Packages
The Reshape2 Package :
Melting:
When you melt a dataset, you restructure it into a format where each measured variable is in its own row, along
with the ID variables needed to uniquely identify it
Syntax:melt(data, id=)
Arguments:
data:dataset that you want to melt
id:Id variables
Example:consider the following table for the melt function
library(reshape)
md <- melt(mydata, id=(c("id", "time")))
Package ‘data.table’: Extension of data.frame for fast indexing, fast ordered joins,fast assignment, fast
grouping
and list columns
Package ‘plyr’: For splitting, applying and combining data
Package ‘stringr’ :Make it easier to work with strings
ID Time X1 X2
1 1 5 6
1 2 3 5
2 1 6 1
2 2 2 4
45. Special Values
NA
In R, the NA values are used to represent missing values. (NA stands for “not available.”)
You will encounter NA values in text loaded into R (to represent missing values) or in data loaded from databases (to
replace NULL values)
If you expand the size of a vector (or matrix or array) beyond the size where values were defined, the new spaces will
have the value NA (meaning “not available”)
Inf and -Inf
If a computation results in a number that is too big, R will return Inf for a positive number and -Inf for a negative
number (meaning positive and negative infinity, respectively)
NaN
Sometimes, a computation will produce a result that makes little sense. In these cases, R will often return NaN
(meaning “not a number”)
E.g. Inf – Inf or 0 / 0
NULL
Additionally, there is a null object in R, represented by the symbol NULL
The symbol NULL always points to the same object
NULL is often used as an argument in functions to mean that no value was assigned to the argument. Additionally,
some functions may return NULL
NULL is not the same as NA, Inf, -Inf, or NaN
Editor's Notes
What R and Data can do
Once you decide on a question after rounds of iterations the next question is WHAT DATA ?
Based on the experience of working with data in the Survey there are 3 lessons that I wish to share.
Power of data: Nkorea, SKorea
The building density on the ground provides an estimate of total build-up area (in square feet/km), which when interacted with zone specific guidance value of property tax per unit area gives an aggregate sum of potential property tax to be collected.
The building density on the ground provides an estimate of total build-up area (in square feet/km), which when interacted with zone specific guidance value of property tax per unit area gives an aggregate sum of potential property tax to be collected.
I just took you all through a journey of what potential data, creative thinking about data and Big data holds to influence and shape policy making
Tables in bland format no utility:
Open R and R Studio: Difference between them: Ram usage
Objects: Symbols
All 4 windows description
X <- 1
Data types vs data structures
Board: vector, matrix, data frame, list[data structures]