This document provides an introduction to control flow in R, including for loops, if/else statements, and vectorization. It discusses how for loops can be slow in R and recommends using vectorized functions instead when possible. It provides examples of if/else, ifelse, while, repeat, and switch statements. It also emphasizes that matrix operations in R are very fast, and shows how to vectorize calculations rather than using for loops in order to efficiently classify US states based on crime rates.
Session 1 of Introduction to R for Data Science, Data Science Serbia in cooperation with Startit, Belgrade, lecturers: ing Branko Kovač and dr Goran S. Milovanović
Session 1 of Introduction to R for Data Science, Data Science Serbia in cooperation with Startit, Belgrade, lecturers: ing Branko Kovač and dr Goran S. Milovanović
Introduction to source{d} Engine and source{d} Lookout source{d}
Join us for a presentation and demo of source{d} Engine and source{d} Lookout. Combining code retrieval, language agnostic parsing, and git management tools with familiar APIs parsing, source{d} Engine simplifies code analysis. source{d} Lookout, a service for assisted code review that enables running custom code analyzers on GitHub pull requests.
A relatively short Introduction to R as presented at the Belgian Software Craftmanship meetup group.
The goal of this presentation is to give you an introduction to:
• The style of the language
• It's ecosystem
• How common things like data manipulation and visualization work
• How to use it for machine learning
• Webdevelopment and report generation in R
• Integrating R in your system
License:
Introduction To R by Samuel Bosch
To the extent possible under law, the person who associated CC0 with Introduction To R has waived all copyright and related or neighboring rights
to Introduction To R.
http://creativecommons.org/publicdomain/zero/1.0/
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
Slides for a Machine Learning Course in R,
includes an introduction to R and several ML methods for classification, regression, clustering and dimensionality reduction.
The Compatibility Challenge:Examining R and Developing TERRLou Bajuk
Slides from Michael Sannella, architect for TIBCO Enterprise Runtime for R (TERR), on the the Compatibility Challenge: Examining R and Developing TERR. Presented at useR 2014
This hands-on R course will guide users through a variety of programming functions in the open-source statistical software program, R. Topics covered include indexing, loops, conditional branching, S3 classes, and debugging. Full workshop materials available from http://projects.iq.harvard.edu/rtc/r-prog
Mehar Singh, CEO of ProCogia, and Jason Grahn, Senior Business Analyst at Apptio, co-present on the journey from Excel to R at the second Bellevue chapter useR Group Meetup.
If we’re producing analysis that drives business decision making, that’s production-grade code! This talk will address this question, which in turn shows why R is the way to go – assumptions are built into the code and enables the analyst to automate & reproduce their efforts.
This presentation includes:
- Data importing (opening a CSV or connecting to a SQL in both tools)
- Filtering, grouping, summarizing (pivot tables in Excel vs. tidy code in R)
- Visualizations (charts in excel vs ggplot in R)
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxcarliotwaycave
INFORMATIVE ESSAY
The purpose of the Informative Essay assignment is to choose a job or task that you know how to do and then write a minimum of 2 full pages, maximum of 3 full pages, Informative Essay teaching the reader how to do that job or task. You will follow the organization techniques explained in Unit 6.
Here are the details:
1. Read the Lecture Notes in Unit 6. You may also find the information in Chapter 10.5 in our text on Process Analysis helpful. The lecture notes will really be the most important to read in writing this assignment. However, here is a link to that chapter that you may look at in addition to the lecture notes:
https://open.lib.umn.edu/writingforsuccess/chapter/10-5-process-analysis/ (Links to an external site.)
2. Choose your topic, that is, the job or task you want to teach. As the notes explain, this should be a job or task that you already know how to do, and it should be something you can do well. At this point, think about your audience (reader). Will your reader need any knowledge or experience to do this job or task, or will you write these instructions for a general reader where no experience is required to perform the job?
3. Plan your outline to organize this essay. Unit 6 notes offer advice on this organization process. Be sure to include an introductory paragraph that has the four main points presented in the lecture notes.
4. Write the essay. It will need to be at least 2 FULL pages long, maximum of 3 full pages long. You will use the MLA formatting that you used in previous essays from Units 3, 4, and 5.
5. Be sure to include a title for your essay.
6. After writing the essay, be sure to take time to read it several times for revision and editing. It would be helpful to have at least one other person proofread it as well before submitting the assignment.
Quiz2
# comments start with #
# to quit q()
# two steps to install any library
#install.packages("rattle")
#library(rattle)
setwd("D:/AJITH/CUMBERLANDS/Ph.D/SEMESTER 3/Data Science & Big Data Analy (ITS-836-51)/RStudio/Week2")
getwd()
x <- 3 # x is a vector of length 1
print(x)
v1 <- c(2,4,6,8,10)
print(v1)
print(v1[3])
v <- c(1:10) #creates a vector of 10 elements numbered 1 through 10. More complicated data
print(v)
print(v[6])
# Import test data
test<-read.csv("CVEs.csv")
test1<-read.csv("CVEs.csv", sep=",")
test2<-read.table("CVEs.csv", sep=",")
write.csv(test2, file="out.csv")
# Write CSV in R
write.table(test1, file = "out1.csv",row.names=TRUE, na="",col.names=TRUE, sep=",")
head(test)
tail(test)
summary(test)
head <- head(test)
tail <- tail(test)
cor(test$X, test$index)
sd(test$index)
var(test$index)
plot(test$index)
hist(test$index)
str(test$index)
quit()
Quiz3
setwd("C:/Users/ialsmadi/Desktop/University_of_Cumberlands/Lectures/Week2/RScripts")
getwd()
# Import test data
data<-read.csv("yearly_sales.csv")
#A 5-number summary is a set of 5 descriptive statistics for summarizing a continuous univariate data set.
#It consists o ...
Introduction to source{d} Engine and source{d} Lookout source{d}
Join us for a presentation and demo of source{d} Engine and source{d} Lookout. Combining code retrieval, language agnostic parsing, and git management tools with familiar APIs parsing, source{d} Engine simplifies code analysis. source{d} Lookout, a service for assisted code review that enables running custom code analyzers on GitHub pull requests.
A relatively short Introduction to R as presented at the Belgian Software Craftmanship meetup group.
The goal of this presentation is to give you an introduction to:
• The style of the language
• It's ecosystem
• How common things like data manipulation and visualization work
• How to use it for machine learning
• Webdevelopment and report generation in R
• Integrating R in your system
License:
Introduction To R by Samuel Bosch
To the extent possible under law, the person who associated CC0 with Introduction To R has waived all copyright and related or neighboring rights
to Introduction To R.
http://creativecommons.org/publicdomain/zero/1.0/
It covers- Introduction to R language, Creating, Exploring data with Various Data Structures e.g. Vector, Array, Matrices, and Factors. Using Methods with examples.
Slides for a Machine Learning Course in R,
includes an introduction to R and several ML methods for classification, regression, clustering and dimensionality reduction.
The Compatibility Challenge:Examining R and Developing TERRLou Bajuk
Slides from Michael Sannella, architect for TIBCO Enterprise Runtime for R (TERR), on the the Compatibility Challenge: Examining R and Developing TERR. Presented at useR 2014
This hands-on R course will guide users through a variety of programming functions in the open-source statistical software program, R. Topics covered include indexing, loops, conditional branching, S3 classes, and debugging. Full workshop materials available from http://projects.iq.harvard.edu/rtc/r-prog
Mehar Singh, CEO of ProCogia, and Jason Grahn, Senior Business Analyst at Apptio, co-present on the journey from Excel to R at the second Bellevue chapter useR Group Meetup.
If we’re producing analysis that drives business decision making, that’s production-grade code! This talk will address this question, which in turn shows why R is the way to go – assumptions are built into the code and enables the analyst to automate & reproduce their efforts.
This presentation includes:
- Data importing (opening a CSV or connecting to a SQL in both tools)
- Filtering, grouping, summarizing (pivot tables in Excel vs. tidy code in R)
- Visualizations (charts in excel vs ggplot in R)
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxcarliotwaycave
INFORMATIVE ESSAY
The purpose of the Informative Essay assignment is to choose a job or task that you know how to do and then write a minimum of 2 full pages, maximum of 3 full pages, Informative Essay teaching the reader how to do that job or task. You will follow the organization techniques explained in Unit 6.
Here are the details:
1. Read the Lecture Notes in Unit 6. You may also find the information in Chapter 10.5 in our text on Process Analysis helpful. The lecture notes will really be the most important to read in writing this assignment. However, here is a link to that chapter that you may look at in addition to the lecture notes:
https://open.lib.umn.edu/writingforsuccess/chapter/10-5-process-analysis/ (Links to an external site.)
2. Choose your topic, that is, the job or task you want to teach. As the notes explain, this should be a job or task that you already know how to do, and it should be something you can do well. At this point, think about your audience (reader). Will your reader need any knowledge or experience to do this job or task, or will you write these instructions for a general reader where no experience is required to perform the job?
3. Plan your outline to organize this essay. Unit 6 notes offer advice on this organization process. Be sure to include an introductory paragraph that has the four main points presented in the lecture notes.
4. Write the essay. It will need to be at least 2 FULL pages long, maximum of 3 full pages long. You will use the MLA formatting that you used in previous essays from Units 3, 4, and 5.
5. Be sure to include a title for your essay.
6. After writing the essay, be sure to take time to read it several times for revision and editing. It would be helpful to have at least one other person proofread it as well before submitting the assignment.
Quiz2
# comments start with #
# to quit q()
# two steps to install any library
#install.packages("rattle")
#library(rattle)
setwd("D:/AJITH/CUMBERLANDS/Ph.D/SEMESTER 3/Data Science & Big Data Analy (ITS-836-51)/RStudio/Week2")
getwd()
x <- 3 # x is a vector of length 1
print(x)
v1 <- c(2,4,6,8,10)
print(v1)
print(v1[3])
v <- c(1:10) #creates a vector of 10 elements numbered 1 through 10. More complicated data
print(v)
print(v[6])
# Import test data
test<-read.csv("CVEs.csv")
test1<-read.csv("CVEs.csv", sep=",")
test2<-read.table("CVEs.csv", sep=",")
write.csv(test2, file="out.csv")
# Write CSV in R
write.table(test1, file = "out1.csv",row.names=TRUE, na="",col.names=TRUE, sep=",")
head(test)
tail(test)
summary(test)
head <- head(test)
tail <- tail(test)
cor(test$X, test$index)
sd(test$index)
var(test$index)
plot(test$index)
hist(test$index)
str(test$index)
quit()
Quiz3
setwd("C:/Users/ialsmadi/Desktop/University_of_Cumberlands/Lectures/Week2/RScripts")
getwd()
# Import test data
data<-read.csv("yearly_sales.csv")
#A 5-number summary is a set of 5 descriptive statistics for summarizing a continuous univariate data set.
#It consists o ...
This was a brief 1-hour introduction to R programming, presented at the 1st Inter-experimental Machine Learning (IML) Working Group Workshop at CERN, 20-22 March 2017.
Python for R developers and data scientistsLambda Tree
This is an introductory talk aimed at data scientists who are well versed with R but would like to work with Python as well. I will cover common workflows in R and how they translate into Python. No Python experience necessary.
The PVS-Studio team is now actively developing a static analyzer for C# code. The first version is expected by the end of 2015. And for now my task is to write a few articles to attract C# programmers' attention to our tool in advance. I've got an updated installer today, so we can now install PVS-Studio with C#-support enabled and even analyze some source code. Without further hesitation, I decided to scan whichever program I had at hand. This happened to be the Umbraco project. Of course we can't expect too much of the current version of the analyzer, but its functionality has been enough to allow me to write this small article.
Best Data Science Ppt using Python
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data.
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
Vibrant Technologies is headquarted in Mumbai,India.We are the best Teradata training provider in Navi Mumbai who provides Live Projects to students.We provide Corporate Training also.We are Best Teradata Database classes in Mumbai according to our students and corporates
A high level introduction to R statistical programming language that was presented at the Chicago Data Visualization Group's Graphing in R and ggplot2 workshop on October 8, 2012.
This talk will present R as a programming language suited for solving data analysis and modeling problems, MLflow as an open source project to help organizations manage their machine learning lifecycle and the intersection of both by adding support for R in MLflow. It will be highly interactive and touch on some of the technical implementation choices taken while making R available in MLflow. It will also demonstrate using MLflow tracking, projects, and models directly from R as well as reusing R models in MLflow to interoperate with other programming languages and technologies.
Uvod u R za Data Science :: Sesija 1 [Intro to R for Data Science :: Session 1]Goran S. Milovanovic
Prezentacija za sesiju 1 kursa Uvod u R za Data Science, Data Science zajednica Srbije u saradnji sa Startit, Beograd, 28. april 2016.
Session 1 Presentation for Intro to R for Data Science course, Data Science Community Serbia in co-operation with Startit
Milovanović, G.S., Krstić, M. & Filipović, O. (2015). Kršenje homogenosti pre...Goran S. Milovanovic
Homogenost preferencija (PH) predstavlja nužan i dovoljan uslov za reprezentaciju donosioca odluka sa stepenom funkcijom korisnosti pod Kumulativnom teorijom izgleda (CPT). Ukoliko ekvivalent u izvesnosti (CE) loza oblika (x, p; 0, 1-p) uzima vrednost CE, PH je zadovoljena ako CE loza (kx, p; 0, 1-p) uzima vrednost kCE. Ipak: pretpostavimo da je donosilac odluka spreman da prihvati siguran iznos od oko 2000 RSD za loz koji sa 50% donosi 4000 RSD; donosilac odluka bi možda prihvatio siguran iznos mnogo manji od 20 miliona RSD za loz koji sa 50% donosi 40 miliona RSD. U literaturi ne postoje direktni testovi PH već se o njenom važenju zaključuje posredno. U ovom radu predstavljamo dva direktna eksperimentalna testa PH.
U Eksperimentu 1 (N=49) ispitanici su dali direktne numeričke ocene CE za 27 lozova oblika (x, p; 0, 1-p), gde je x varirano kao 100, 1000, 100000, 200, 2000, 200000, 500, 5000, i 500000 u RSD, a p kao 5%, 50%, i 90%. Vrednosti na lozovima su uvek bile umnošci osnovnih vrednosti x od 100, 200, i 500 RSD faktorima k = 10 i k = 1000 . Test PH koji smo razvili je semi-parametrijski i sastoji se od sledećih koraka. Prvo se na osnovu medijane CE lozova sa osnovnim vrednostima x određuju očekivane medijane za CE lozova koji nude vrednost x sa umnošcima k = 10 i k = 1000, na odgovarajućim nivoima verovatnoće dobitka: te vrednosti medijana su očekivane ukoliko je PH zadovoljena. Zatim se binomijalnim testom ispituje da li je raspodela CE ispitanika strogo iznad i strogo ispod očekivane medijane simetrična, i ukoliko nije, donosi se zaključak da PH nije zadovoljena. Intuitivno, ako se PH krši, očekuje se veća proporcija ispitanika ispod očekivane medijane. U devet od 18 binomijalnih testova PH nije bila zadovoljena na nivou p < .05; pored toga, dva puta je vrednost testa bila statistički marginalno značajna (p < .07). Svaki put kada je PH kršena, kršena je na očekivani način. U Eksperimentu 2 (N=37), koji je izveden po istom dizajnu sa nivoima p od 25%, 50%, i 75%, dobijeni su isti rezultati (devet kršenja PH na nivou p < .05, i dva marginalno značajna na p < .09 od kojih jedno u nepredviđenom pravcu). U Eksperimentu 2, sa faktorom k = 1000, svi ispitanici na svim lozovima krše PH u očekivanom pravcu.
U 50% eksperimentalnih situacija u ovoj studiji PH nije bila zadovoljena; pri tom, njena kršenja su sistematske prirode i konzistentna sa intuicijom. Važenje PH ne predstavlja solidnu pretpostavku za izgradnju deskriptivne teorije odlučivanja.
Ključne reči: kumulativna teorija izgleda, homogenost preferencija, rizik, stepena funkcija korisnosti.
Učenje i viši kognitivni procesi 8. Simboličke funkcije, IV Deo: Analogija i ...Goran S. Milovanovic
Učenje i viši kognitivni procesi 8. Simboličke funkcije, IV Deo: Analogija i strukturalno mapiranje, konceptualne kombinacije i interpretacija karakteristika u kategorizaciji
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
1. Introduction to R for Data Science
Lecturers
dipl. ing Branko Kovač
Data Analyst at CUBE/Data Science Mentor
at Springboard
Data Science zajednica Srbije
branko.kovac@gmail.com
dr Goran S. Milovanović
Data Scientist at DiploFoundation
Data Science zajednica Srbije
goran.s.milovanovic@gmail.com
goranm@diplomacy.edu
2. Control Flow in R
• for, while, repeat
• if, else
• switch
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Starting with simple 'if‘
num <- 2 # some value to test with
if (num > 0) print("num is positive")
# if condition num > 0 stands than print() is executed
# Sometimes 'if' has its 'else‘
if (num > 0) { # test to see if it's positive
print("num is positive") # print in case of positive number
} else { print("num is negative") # it's negative if not positive }
# Careful: place your else right after the end (‘}’) of the conditional block
3. Vectorized: ifelse
• for, while, repeat
• if, else, ifelse
• switch
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# R is vectorized so there's vectorized if-else
simple_vect <- c(1, 3, 12, NA, 2, NA, 4) # just another num vector with NAs
ifelse(is.na(simple_vect), "nothing here", "some number")
# nothing here if it's an NA or it's a number
4. For loops: slow and slower
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# For loop is always working same way
for (i in simple_vect) print(i)
# Be aware that loops can be slow if
vec <- numeric()
system.time(
for(i in seq_len(50000-1)) {
some_calc <- sqrt(i/10)
# this is what makes it slow:
vec <- c(vec, some_calc)
})
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# This solution is slightly faster
iter <- 50000;
# this makes it faster:
vec <- numeric(length=iter)
system.time(
for(i in seq_len(iter-1)) {
some_calc <- sqrt(i/10);
vec[i] <- some_calc # ...not this!
})
5. For loops: slow and slower
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# This solution is even faster
iter <- 50000
vec <- numeric(length=iter) # not because of this...
system.time(
for(i in seq_len(iter-1)) {
vec[i] <- sqrt(i/10) # ...but because of this!
})
6. For loops vs. vectorized functions
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Another example how loops can be slow
# (loop vs vectorized functions)
iter <- 50000
system.time(for (i in 1:iter) {
vec[i] <- rnorm(n=1, mean=0, sd=1)
# approach from previous example
})
system.time(y <- rnorm(iter, 0, 1)) # but this is much much faster
7. while, repeat…
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# R also knows about while loop
r <- 1 # initializing some variable
while (r < 5) { # while r < 5
print(r) # print r
r <- r + 1 # increase r by 1
}
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Nope, we didn't forget 'repeat' loop
i <- 1
repeat { # there is no condition!
print(i)
i <- i + 1
if (i == 10) break
# ...so we have to break it if we
# don't want infinite loop
}
8. switch
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
switch(2, "data", "science", "serbia") # choose one option based on value
# More on switch:
switchIndicator <- "A“
# switchIndicator <- "switchIndicator“
# switchIndicator <- "AvAvAv“ # play with this three conditions
# rare situations where you do not need to enclose strings: ' ', or " “
switch(switchIndicator,
A = {print(switchIndicator)},
switchIndicator = {unlist(strsplit(switchIndicator,"h"))},
AvAvAv = {print(nchar(switchIndicator))}
)
9. switch()
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
type = 2
cc <- c("A", "B", "C")
switch(type,
c1 = {print(cc[1])},
c2 = {print(cc[2])},
c3 = {print(cc[3])},
{print("Beyond C...")} # default choice
);
# However…
10. switch()
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# if you do this, R will miss the default choice, so be careful w. switch:
type = 4
cc <- c("A", "B", "C")
switch(type,
print(cc[1]),
print(cc[2]),
print(cc[3]),
{print("Beyond C...")}
# the unnamed default choice works only
# if previous choices are named!
)
# switch is faster than if… else… (!)
11. Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
### vectorization in R
dataSet <- USArrests;
# data$Murder, data$Assault, data$Rape: columns of data
# in behavioral sciences (psychology or biomedical sciences, for example) we would call them:
# variables (or factors, even more often)
# in data science and machine learning, we usually call them: FEATURES
# in psychology and behavioral sciences, the usage of the term "feature" is usually constrained
# to theories of categorization and concept learning
# Task: classify the US states according to some global indicator of violent crime
# Two categories (simplification): more dangerous and less dangerous (F)
# We have three features: Murder, Rape, Assault, all per 100,000 inhabitants
# The idea is to combine the three available features.
# Let's assume that we arbitrarily assign the following preference order over the features:
# Murder > Rape > Assault
# in terms of the severity of the consequences of the associated criminal acts
12. Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Let's first isolate the features from the data.frame
featureMatrix <- as.matrix(dataSet[, c(1,4,2)]);
# Let's WEIGHT the features in accordance with the imposed preference order:
weigthsVector <- c(3,2,1); # mind the order of the columns in featureMatrix
# Essentially, we want our global indicator to be a linear combination of all three selected
# features, where each feature is weighted by the corresponding element of the weigthsVector:
featureMatrix <- cbind(featureMatrix,numeric(length(featureMatrix[,1])));
for (i in 1:length(featureMatrix[,1])) {
featureMatrix[i,4] <- sum(weigthsVector*featureMatrix[i,1:3]);
# don't forget: this "*" multiplication in R is vectorized and operates element-wise
# we have a 1x3 weightsVector and a 1x3 featureMatrix[i,1:3], Ok
# sum() then produces the desired linear combination
}
13. Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# Classification; in the simplest case, let's simply take a look at
# the distribution of our global indicator:
hist(featureMatrix[,4],20); # it's multimodal and not too symmetric; go for median
criterion <- median(featureMatrix[,4]);
# And classify:
dataSet$Dangerous <- ifelse(featureMatrix[,4]>=criterion,T,F);
# Ok. You will never do this before you have a model that has actually *learned* the
# most adequate feature weights. This is an exercise only.
# ***Important***: have you seen the for loop above? Well...
# N e v e r d o t h a t.
dataSet$Dangerous <- NULL;
14. Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# In Data Science, you will be working with huge amounts of quantitative data.
# For loops are slow. But in vector programming languages like R...
# matrix computations are seriously fast.
# What you ***want to do*** is the following:
# Let's first isolate the features from the data.frame
featureMatrix <- as.matrix(dataSet[, c(1,4,2)]);
# Let's WEIGHT the features in accordance with the imposed preference order:
weigthsVector <- c(3,2,1); # mind the order of the columns in featureMatrix
# Feature weighting:
wF <- weigthsVector %*% t(featureMatrix);
# In R, t() is for: transpose
# In R, %*% is matrix multiplication
15. Vectorization
Intro to R for Data Science
Session 4: Control Flow
# Introduction to R for Data Science
# SESSION 4 :: 19 May, 2016
# oh yes: R knows about row and column vectors - and you want to put this one
# as a COLUMN in your dataSet data.frame, while wF is currently a ROW vector, look:
wF
length(wF)
wF <- t(wF)
# and classify:
dataSet$Dangerous <- ifelse(wF>=median(wF),T,F);