This document provides examples of different R data structures including vectors, matrices, lists, and data frames. Vectors are one-dimensional arrays that can contain only one data type. Matrices are two-dimensional arrays that can contain only one data type. Lists are collections of elements that can contain different data types. Data frames are two-dimensional structures similar to tables or spreadsheets that can contain different data types across rows and columns. The document demonstrates how to create, subset, and manipulate each of these structures through examples.
A balanced search tree data structure
NIST Definition: A binary search tree in which operations that access nodes restructure the tree.
Goodrich: A splay tree is a binary search tree T. The only tool used to maintain balance in T is the splaying step done after every search, insertion, and deletion in T.
Kingston: A binary tee with splaying
This slide is about all attributes and properties of "Tree" in data structure and algorithm at computer science. Students can able to get a quick overview of "Tree" from this presentation.
As part of the GSP’s capacity development and improvement programme, FAO/GSP have organised a one week training in Izmir, Turkey. The main goal of the training was to increase the capacity of Turkey on digital soil mapping, new approaches on data collection, data processing and modelling of soil organic carbon. This 5 day training is titled ‘’Training on Digital Soil Organic Carbon Mapping’’ was held in IARTC - International Agricultural Research and Education Center in Menemen, Izmir on 20-25 August, 2017.
A balanced search tree data structure
NIST Definition: A binary search tree in which operations that access nodes restructure the tree.
Goodrich: A splay tree is a binary search tree T. The only tool used to maintain balance in T is the splaying step done after every search, insertion, and deletion in T.
Kingston: A binary tee with splaying
This slide is about all attributes and properties of "Tree" in data structure and algorithm at computer science. Students can able to get a quick overview of "Tree" from this presentation.
As part of the GSP’s capacity development and improvement programme, FAO/GSP have organised a one week training in Izmir, Turkey. The main goal of the training was to increase the capacity of Turkey on digital soil mapping, new approaches on data collection, data processing and modelling of soil organic carbon. This 5 day training is titled ‘’Training on Digital Soil Organic Carbon Mapping’’ was held in IARTC - International Agricultural Research and Education Center in Menemen, Izmir on 20-25 August, 2017.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
ID3, C4.5 :used to generate a decision tree developed by Ross Quinlan typically used in the machine learning and natural language processing domains, overview about these algorithms with illustrated examples
Abstract: This PDSG workshop introduces basic concepts of simple linear regression in machine learning. Concepts covered are Slope of a Line, Loss Function, and Solving Simple Linear Regression Equation, with examples.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
Presentation on R programming. Topics covered are: Manage your Workspace
Data types
Fiddle with Data Types
Lists Vs Vectors
R as calculator!!!
Decision making statements, looping, functions
Interact with R!!!
Visualization!!!
Time for U!!!
Clustering
Regression (with curve fitting)
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
ID3, C4.5 :used to generate a decision tree developed by Ross Quinlan typically used in the machine learning and natural language processing domains, overview about these algorithms with illustrated examples
Abstract: This PDSG workshop introduces basic concepts of simple linear regression in machine learning. Concepts covered are Slope of a Line, Loss Function, and Solving Simple Linear Regression Equation, with examples.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
Presentation on R programming. Topics covered are: Manage your Workspace
Data types
Fiddle with Data Types
Lists Vs Vectors
R as calculator!!!
Decision making statements, looping, functions
Interact with R!!!
Visualization!!!
Time for U!!!
Clustering
Regression (with curve fitting)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
This is an interactive introduction to R.
R is an open source language for statistical computing, data analysis, and graphical visualization.
While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in industry as well – both Facebook and Google use R within their firms.
This hands-on R course will guide users through a variety of programming functions in the open-source statistical software program, R. Topics covered include indexing, loops, conditional branching, S3 classes, and debugging. Full workshop materials available from http://projects.iq.harvard.edu/rtc/r-prog
The goal of this workshop is to introduce fundamental capabilities of R as a tool for performing data analysis. Here, we learn about the most comprehensive statistical analysis language R, to get a basic idea how to analyze real-word data, extract patterns from data and find causality.
This hands-on R course will demonstrate a variety of statistical procedures using the open-source statistical software program, R. Emphasis is on regression modeling using the Zelig package.
Workshop materials including example data sets and R scripts are available from http://projects.iq.harvard.edu/rtc/r-stats
This introduction to the popular ggplot2 R graphics package will show you how to create a wide variety of graphical displays in R. Data sets and additional workshop materials available at http://projects.iq.harvard.edu/rtc/event/r-graphics
A high level introduction to R statistical programming language that was presented at the Chicago Data Visualization Group's Graphing in R and ggplot2 workshop on October 8, 2012.
I don't think it's hyperbole when I say that Facebook, Instagram, Twitter & Netflix now define the dimensions of our social & entertainment universe. But what kind of technology engines purr under the hoods of these social media machines?
Here is a tech student's perspective on making the paradigm shift to "Big Data" using innovative models: alphabet blocks, nesting dolls, & LEGOs!
Get info on:
- What is Cassandra (C*)?
- Installing C* Community Version on Amazon Web Services EC2
- Data Modelling & Database Design in C* using CQL3
- Industry Use Cases
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
2. R data structures
VECTOR
[1]
•
•
•
•
MATRIX
[2]
[3]
1 row, N columns.
One data type only (numeric, character, date, OR logical).
Uses: track changes in a single variable over time.
Examples: stock prices, hurricane path, temp readings, disease spread,
financial performance, sports scores.
[,1]
•
•
•
[,3]
[1,]
[2,]
[3,]
•
•
•
N row, N columns.
One data type only (any combination of numeric, character, date, logical).
Basically, a collection of vectors.
DATA FRAME
LIST
[1]
[,2]
[2]
[,1]
[3]
1 row, N columns. Multiple data types.
Uses: ist detailed information for a person/place/thing/concept.
Examples: Listing for real estate, book, movie, contact, country, stock,
company, etc. Or, a "snapshot" or observation of an event or phenomenon
such as stock market, or scientific experiment.
[,2]
[,3]
[1,]
[2,]
[3,]
•
•
•
N rows, N columns.
Multiple data types.
Basically, a collection of lists or snapshots which when assembled together
provide a "bigger picture."
3. Other important R concepts
FACTORS
USER-DEFINED FUNCTIONS
Stores each distinct value only once, and the data itself is stored as a vector of
integers. When a factor is first created, all of its levels are stored along with the
factor.
> f <- function(a) { a^2 }
> f(2)
[1] 4
> weekdays=c("Monday","Tuesday","Wednesday","Thursday","Friday")
> wf <- factor(weekdays)
[1] Monday
Tuesday
Wednesday Thursday Friday
Levels: Friday Monday Thursday Tuesday Wednesday
Used to group and summarize data:
WeekDaySales <- (DailySalesVector, wf, sum)
# Sum daily sales figures by M,T,W,Th,F
PACKAGES, FUNCTIONS, DATASETS
> search() # Search for installed packages & datasets
[1] ".GlobalEnv"
"mtcars"
"tools:rstudio"
[4] "package:stats"
"package:graphics" "package:grDevices"
•
•
•
•
SPECIAL VALUES
•
•
•
# List available datasets
pi=3.141593. Use lowercase "pi"; "Pi" or "PI" won't work
inf=1/0 (Infinity)
NA=Not Available. A logical constant of length 1 that means neither
TRUE nor FALSE. Causes functions to barf.
•
Tell function to ignore NAs: function(args, na.rm=TRUE)
•
Check for NA values: is.na(x)
> library(ggplot2) # load package ggplot2
Attaching package: ‘ggplot2’
> data()
Functions can be passed as arguments to other functions.
Function behavior is defined inside the curly brackets { }.
Functions can be nested, so that you can define a function inside another.
The return value of a function is the last expression evaluated.
•
NULL=Empty Value. Not allowed in vectors or matrixes.
•
> attach(iris) # Attach dataset "iris"
•
Check for NULL values: is.null(x)
NaN=Not a Number. Numeric data type value for undefined (e.g., 0/0).
See this for NA vs. NULL explanation.
10. DATA FRAME:
Examples
[,1]
[,2]
[,3]
[1,]
[2,]
[3,]
Data Frames: Most frequently used structure for storing and manipulating
data sets. Similar to:
• A database table
• A spreadsheet
Like the above, DFs have rows x columns, but terminology is different:
• Observations = rows
• Variables = Columns
R Table vs. Data Frame: KISS and stick to data frames for now.
#Convert table to data frame:
> HEC <- data.frame(HairEyeColor)
> str(HEC)
'data.frame':32 obs. of 4 variables:
$ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...
$ Eye : Factor w/ 4 levels "Brown","Blue",..: 1 1 1 1 2 2 2 2 3 3 ...
$ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
$ Freq: num 32 53 10 3 11 50 10 30 10 25 ...
11. DATA FRAMES
[,1]
[,2]
[,3]
[1,]
[2,]
[3,]
# HEC[1,] returns a row
> HEC[1,]
Hair
Eye Sex Freq
1 Black Brown Male
32
# Subsetting made easier
> HEC6 <-subset(HEC,select=Hair); str(HEC6)
'data.frame':
32 obs. of 1 variable:
$ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4
> HEC7 <-subset(HEC,select= c(Hair,Eye)); str(HEC7)
'data.frame':
32 obs. of 2 variables:
$ Hair: Factor w/ 4 levels "Black","Brown"...
$ Eye : Factor w/ 4 levels "Brown","Blue"...
> HEC8 <-subset(HEC, subset=(Hair == "Black" & Eye ==
"Brown")); HEC8
Hair
Eye
Sex Freq
1 Black Brown
Male
32
17 Black Brown Female
36
INDEXING, SELECTING
& SUBSETTING
# HEC[[1]], HEC[,"Hair"], HEC$Hair return column
> HEC1 <-HEC[[1]]; HEC1
> str(HEC1)
Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...
# HEC[1] and HEC["Hair"] return column dframe
> HEC2 <-HEC[1]; HEC2
> str(HEC2)
'data.frame':32 obs. of 1 variable:
$ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2
> HEC4 <-HEC["Hair"]
> HEC2 == HEC4
Hair
[1,] TRUE
[2,] TRUE
#
#
>
>
etc.
Returning multiple columns in a data frame
This is the same as HEC[,c(1, 4)]
HEC5 <-HEC[,c("Hair", "Freq")]
str(HEC5)
'data.frame':
32 obs. of 2 variables:
$ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1
$ Freq: num 32 53 10 3 11 50 10 30 10 25 ...
12. DATA FRAMES
# Combine 2 DFs columnwise
> echo <- cbind(HEC2, HEC4)
> echo
Hair Hair
1 Black Black
2 Brown Brown
3 Red
Red etc
# Stack 2 DFs (UNION)
> rbind(HEC8, HEC8)
1
17
11
171
Hair
Black
Black
Black
Black
Eye
Sex Freq
Brown
Male
32
Brown Female
36
Brown
Male
32
Brown Female
36
# Skip having to specify the DF for col names
> f <- sum(HEC$Freq)
# Instead of this
> attach(HEC)
> f <- sum(Freq) # Use this
13. FUNCTIONS
SEQUENCING
seq(from,to,by) generate a sequence
indices <- seq(1,10,2)
#indices is c(1, 3, 5, 7, 9)
rep(x,ntimes)
repeat x n times
y <- rep(1:3, 2)
# y is c(1, 2, 3, 1, 2, 3)
cut(x,n)
divide continuous variable in factor
with n levels
y <- cut(x, 5)
DATE PROCESSING
Sys.Date()
as.date()
generate today's date
> Sys.Date()
[1] "2013-11-03
Convert string to date format
> to=as.Date('2006-1-10')
> mode(to)
[1] "numeric"
> class(to)
[1] "Date"
CHARACTER PROCESSING Description
substr(x, start=n1, stop=n2)
Extract or replace substrings in a character vector.
x <- "abcdef"
substr(x, 2, 4) is "bcd"
substr(x, 2, 4) <- "22222" is "a222ef"
grep(pattern, x ,
Search for pattern in x. If fixed =FALSE then pattern is
ignore.case=FALSE, fixed=FALSE) a regular expression. If fixed=TRUE then pattern is a
text string. Returns matching indices.
grep("A", c("b","A","c"), fixed=TRUE) returns 2
sub(pattern, replacement,x,
Find pattern in x and replace with replacement text. If
ignore.case =FALSE, fixed=FALSE) fixed=FALSE then pattern is a regular expression.
If fixed = T then pattern is a text string.
sub("s",".","Hello There") returns "Hello.There"
strsplit(x, split)
Split the elements of character vector x at split.
strsplit("abc", "") returns 3 element vector "a","b","c"
paste(..., sep="")
Concatenate strings after using sep string to seperate
them.
paste("x",1:3,sep="") returns c("x1","x2" "x3")
paste("x",1:3,sep="M") returns c("xM1","xM2" "xM3")
paste("Today is", date())
toupper(x)
Uppercase
tolower(x)
Lowercase
TYPE CONVERSION
STRUCTURE CONVERSION
as.character(x)
as.complex(x)
as.numeric(x)
as.logical(x)
as.data.frame(x)
as.list(x)
as.matrix(x)
as.vector(x)
14. DATA TRANSFORMATIONS
VECTOR
[1]
[2]
[3]
MATRIX
# s=simplify into a vector
# sapply returns a vector
l <- sapply(lst,function)
# lapply returns a list
v <- lapply(lst,function)
[,1]
[,3]
[1,]
[2,]
[3,]
DATA FRAME
LIST
[1]
[,2]
[2]
[,1]
[3]
[1,]
[2,]
[3,]
[,2]
[,3]
15. GET HELPFUL INFO
PRINTING
# Get help
>help.search("cat") # find info about "cat"
>?mean # get help about function
>example(mean) # get examples
> print(matrix(c(1234),2,2))
[,1] [,2]
[1,] 1234 1234
[2,] 1234 1234
# List objects in workspace
> ls()
[1] "tbl"
"w_day"
> print(matrix(c(1,2,3,4),2,2))
[,1] [,2]
[1,]
1
3
[2,]
2
4
# List all available datasets
> data()
# Get structure
> str(HairEyeColor)
table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25
...
- attr(*, "dimnames")=List of 3
..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"
..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"
..$ Sex : chr [1:2] "Male" "Female"
# Get Class (vector,list, dataframe, table, matrix,
numeric, function, factor,, et)
> class(HairEyeColor)
[1] "table"
# Use Google R style sheet
> print ("print works on only");print("one
string or variable at a time"); print(pi)
[1] "print works on only"
[1] "one string or variable at a time"
[1] 3.141593
> num <-1:10
> print(num)
[1] 1 2 3
4
5
6
7
8
9 10
# cat works only on strings and vectors
> cat("the first 10 numbers are:", num, "n")
the first 10 numbers are: 1 2 3 4 5 6 7 8 9 10
16. INPUT / OUTPUT
Ctrl-R executes the selected line(s)
# Getting and setting the working directory
> getwd()
[1] "C:/Users/mdarling/Documents"
> setwd("DA/data")
[1] "C:/Users/mdarling/Documents/DA/data"
# Enter data using spreadsheet editor
w_day <- data.frame()
w_day <- edit(w_day)
# Read data from URL
> tbl <read.csv("http://www.andrewpatton.com/countrylist.csv")
# Write data to csv file
> write.csv(tbl, "countries.csv")
#
>
>
>
Read data from HTML tables
library(XML)
url <-"http://www.andrewpatton.com/countrylist.html"
tbls <- readHTMLTable(url)
MORE DATE PROCESSING
library(timeDate)
ymdhs <- "2012-03-04 05:06:07"
pd.sec <- as.POSIXlt(ymdhs)$sec
pd.hour <- as.POSIXlt(ymdhs)$hour
pd.min <- as.POSIXlt(ymdhs)$min
pd.mday <- as.POSIXlt(ymdhs)$mday
pd.mon <- ((as.POSIXlt(ymdhs)$mon)+1)
pd.year <- ((as.POSIXlt(ymdhs)$year) + 1900)
17. PLOTTING
Plotting in R
- base
- ggplot2, ggmap, map
Types of Graphs
- chloropleth
- heat map
# Base plots
plot(faithful, type = 'l') #line graph
plot(faithful, type = 'p') #point graph
hist(faithful$waiting)
#histogram of column waiting
# Quickly plot a matrix of scatterplots
# This plots each column vs. all the other ones
names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width" "Species"
pairs(iris[,-5])
pairs(iris[,1:2])
# Plot x vs. y using 2 df columnns and geom_point()
ggplot(movies, aes(x=year, y=budget)) + geom_point()
# Plot histogram using 1 column, Note: geom_bar()
ggplot(movies, aes(x=year)) + geom_bar()
# plot all rows vs. mpaa column
plot(movies[, "mpaa"]) # plot has lots of nulls
mpaa.movies <- subset(movies, mpaa != "")! # exclude nulls
plot(mpaa.movies[, "mpaa"])
# Or use na.rm