Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
Introduction into R for historians (part 3: examine and import data)Richard Zijdeman
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
Introduction into R for historians (part 4: data manipulation)Richard Zijdeman
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
3x half day lecture and practicals on introductory facets of R, amongst others: installing R and RStudio, reading in and writing out data, data cleaning, descriptive statistics, data visualization (including visual analysis). Courtesy of the European Historical Sample Population Network and the Babeş-Bolyai University (Cluj-Napoca, Romania)
For the past several decades the rising tide of technology -- especially the increasing speed of single processors -- has allowed the same data analysis code to run faster and on bigger data sets. That happy era is ending. The size of data sets is increasing much more rapidly than the speed of single cores, of I/O, and of RAM. To deal with this, we need software that can use multiple cores, multiple hard drives, and multiple computers.
That is, we need scalable data analysis software. It needs to scale from small data sets to huge ones, from using one core and one hard drive on one computer to using many cores and many hard drives on many computers, and from using local hardware to using remote clouds.
R is the ideal platform for scalable data analysis software. It is easy to add new functionality in the R environment, and easy to integrate it into existing functionality. R is also powerful, flexible and forgiving.
I will discuss the approach to scalability we have taken at Revolution Analytics with our package RevoScaleR. A key part of this approach is to efficiently operate on "chunks" of data -- sets of rows of data for selected columns. I will discuss this approach from the point of view of:
- Storing data on disk
- Importing data from other sources
- Reading and writing of chunks of data
- Handling data in memory
- Using multiple cores on single computers
- Using multiple computers
- Automatically parallelizing "external memory" algorithms
Introduction into R for historians (part 3: examine and import data)Richard Zijdeman
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
Introduction into R for historians (part 4: data manipulation)Richard Zijdeman
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
3x half day lecture and practicals on introductory facets of R, amongst others: installing R and RStudio, reading in and writing out data, data cleaning, descriptive statistics, data visualization (including visual analysis). Courtesy of the European Historical Sample Population Network and the Babeş-Bolyai University (Cluj-Napoca, Romania)
For the past several decades the rising tide of technology -- especially the increasing speed of single processors -- has allowed the same data analysis code to run faster and on bigger data sets. That happy era is ending. The size of data sets is increasing much more rapidly than the speed of single cores, of I/O, and of RAM. To deal with this, we need software that can use multiple cores, multiple hard drives, and multiple computers.
That is, we need scalable data analysis software. It needs to scale from small data sets to huge ones, from using one core and one hard drive on one computer to using many cores and many hard drives on many computers, and from using local hardware to using remote clouds.
R is the ideal platform for scalable data analysis software. It is easy to add new functionality in the R environment, and easy to integrate it into existing functionality. R is also powerful, flexible and forgiving.
I will discuss the approach to scalability we have taken at Revolution Analytics with our package RevoScaleR. A key part of this approach is to efficiently operate on "chunks" of data -- sets of rows of data for selected columns. I will discuss this approach from the point of view of:
- Storing data on disk
- Importing data from other sources
- Reading and writing of chunks of data
- Handling data in memory
- Using multiple cores on single computers
- Using multiple computers
- Automatically parallelizing "external memory" algorithms
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013Big Data Spain
The workshop will illustrate a number of techniques for data modelling that help us extend our small data capabilities to the world of big data: sampling, resampling, parallelization where possible, etc. We will leverage the functional architecture of R and its statistical analysis prowess in small data environments using the mapreduce technique embedded in Hadoop to tackle large data analysis problems. Particular attention will be paid to the ubiquitous --but non-scalable-- logistic regression technique and its big data alternatives.
A presentation on the history, design, and use of R. The talk will focus on companies that use and support R, use cases, where it is going, competitors, advantages and disadvantages, and resources to learn more about R. Speaker Bio
Joseph Kambourakis has been the Lead Data Science Instructor at EMC for over two years. He has taught in eight countries and been interviewed by Japanese and Saudi Arabian media about his expertise in Data Science. He holds a Bachelors in Electrical and Computer Engineering from Worcester Polytechnic Institute and an MBA from Bentley University with a concentration in Business Analytics.
Introduction to the R Statistical Computing Environmentizahn
Get an introduction to R, the open-source system for statistical computation and graphics. With hands-on exercises, learn how to import and manage datasets, create R objects, and conduct basic statistical analyses. Full workshop materials can be downloaded from http://projects.iq.harvard.edu/rtc/event/introduction-r
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
Introduction to R language, How to install R and R studio
How to import dataset in R
How to explore data in R
www.simpleresearch.net
info@simpleresearch.net
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Very brief introduction to R software that I have presented at UNISZA. No R codes and No Statistical Contents. Basically for those who just heard about R software for the first time
Hybrid acquisition of temporal scopes for rdf dataAnisa Rula
Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching,
selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia
with an F-measure of up to 70%.
... or how to query an RDF graph with 28 billion triples in a standard laptop
These slides correspond to my talk at the Stanford Center for Biomedical Informatics, on 25th April 2018
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
Presented by David Smith and Michael Helbraun to the Portland R User Group, November 13, 2013
http://www.meetup.com/portland-r-user-group/events/147311372/
Historical occupational classification and occupational stratification schemesRichard Zijdeman
Lecture slides of 1 day course on the practice of coding historical occupations into HISCO and HISCAM consisting of 2x1.5 hours lecture and afternoon computer hands-on session in R.
Workshop - Hadoop + R by CARLOS GIL BELLOSTA at Big Data Spain 2013Big Data Spain
The workshop will illustrate a number of techniques for data modelling that help us extend our small data capabilities to the world of big data: sampling, resampling, parallelization where possible, etc. We will leverage the functional architecture of R and its statistical analysis prowess in small data environments using the mapreduce technique embedded in Hadoop to tackle large data analysis problems. Particular attention will be paid to the ubiquitous --but non-scalable-- logistic regression technique and its big data alternatives.
A presentation on the history, design, and use of R. The talk will focus on companies that use and support R, use cases, where it is going, competitors, advantages and disadvantages, and resources to learn more about R. Speaker Bio
Joseph Kambourakis has been the Lead Data Science Instructor at EMC for over two years. He has taught in eight countries and been interviewed by Japanese and Saudi Arabian media about his expertise in Data Science. He holds a Bachelors in Electrical and Computer Engineering from Worcester Polytechnic Institute and an MBA from Bentley University with a concentration in Business Analytics.
Introduction to the R Statistical Computing Environmentizahn
Get an introduction to R, the open-source system for statistical computation and graphics. With hands-on exercises, learn how to import and manage datasets, create R objects, and conduct basic statistical analyses. Full workshop materials can be downloaded from http://projects.iq.harvard.edu/rtc/event/introduction-r
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
Introduction to R language, How to install R and R studio
How to import dataset in R
How to explore data in R
www.simpleresearch.net
info@simpleresearch.net
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Very brief introduction to R software that I have presented at UNISZA. No R codes and No Statistical Contents. Basically for those who just heard about R software for the first time
Hybrid acquisition of temporal scopes for rdf dataAnisa Rula
Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching,
selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia
with an F-measure of up to 70%.
... or how to query an RDF graph with 28 billion triples in a standard laptop
These slides correspond to my talk at the Stanford Center for Biomedical Informatics, on 25th April 2018
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
Presented by David Smith and Michael Helbraun to the Portland R User Group, November 13, 2013
http://www.meetup.com/portland-r-user-group/events/147311372/
Historical occupational classification and occupational stratification schemesRichard Zijdeman
Lecture slides of 1 day course on the practice of coding historical occupations into HISCO and HISCAM consisting of 2x1.5 hours lecture and afternoon computer hands-on session in R.
Domain name system dinamis dengan protokol dinamid versi 1.02Mohammad Anwari
Seminar Akademik Program Studi Ilmu Komputer Institut Pertanian Bogor
Selasa, 18 Agustus 1998
Mohammad Anwari
Dr. Ir. Muhammad Syamsun, M.Sc.
Dr. Ir. Mohammad Nabil, M.Sc.
Presented by: Joseph Rickert, Data Scientist Community Manager, Revolution Analytics, Sep 25 2014.
Whenever data scientists are asked about what software they use R always comes up at the top of the list. In one recent survey, only SQL was rated higher than R. In this webinar we will explore what makes R so popular and useful. Starting with the big picture, we describe how R is organized and how to find your way around the R world. Then we will work through some examples highlighting features of R that make it attractive for data science work including:
Acquiring data
Data manipulation
Exploratory data analysis
Model building
Machine learning
This slide set is meant to be a teaching guide to R functionality. It includes hands-on exercises meant to be used for an audience sitting in front of a computer.
Basic tutorial for R programming. this video contains lot of information about r programming like
agenda
history
SOFTWARE PARADIGM
R interface
advantages of r
drawbacks of r
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
Vibrant Technologies is headquarted in Mumbai,India.We are the best Teradata training provider in Navi Mumbai who provides Live Projects to students.We provide Corporate Training also.We are Best Teradata Database classes in Mumbai according to our students and corporates
Learn Business Analytics with R at edureka!Edureka!
This is a 6-week course for professionals who aspire to learn 'R' language for Analytics. Practical approach of learning has been followed in order to provide a real time experience and make you think like an analyst. Our course will cover not only the basic concepts but also the advanced concepts like Data Visualization, Data Mining, Model Building in R, Web Analytics and so on.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1ZW7TDL.
Richard Dallaway shows an example of what Scala looks like when using pattern matching over classes, how to encode an idea into types and use advanced features of Scala without complicating the code. Filmed at qconlondon.com.
Richard Dallaway is a partner at Underscore -- a consultancy specializing in Scala, especially the type-driven and functional aspects of Scala. He works on client projects writing software and helping teams deliver software with Scala. His focus is on the web, machine learning, and code review. He's the co-author of "Essential Slick" (Underscore), and author of the "Lift Cookbook" (O'Reilly).
Presentation given by US Chief Scientist, Mario Inchiosa, at the June 2013 Hadoop Summit in San Jose, CA.
ABSTRACT: Hadoop is rapidly being adopted as a major platform for storing and managing massive amounts of data, and for computing descriptive and query types of analytics on that data. However, it has a reputation for not being a suitable environment for high performance complex iterative algorithms such as logistic regression, generalized linear models, and decision trees. At Revolution Analytics we think that reputation is unjustified, and in this talk I discuss the approach we have taken to porting our suite of High Performance Analytics algorithms to run natively and efficiently in Hadoop. Our algorithms are written in C++ and R, and are based on a platform that automatically and efficiently parallelizes a broad class of algorithms called Parallel External Memory Algorithms (PEMA’s). This platform abstracts both the inter-process communication layer and the data source layer, so that the algorithms can work in almost any environment in which messages can be passed among processes and with almost any data source. MPI and RPC are two traditional ways to send messages, but messages can also be passed using files, as in Hadoop. I describe how we use the file-based communication choreographed by MapReduce and how we efficiently access data stored in HDFS.
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
Presentations on behalf of the workshop series Digital Humanities & Archieven. Nationaal Archief, The Hague, The Netherlands. March 25, 2019. Language @nl.
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
A glimpse of how we are used to connecting datasets on our laptops and how, imho, need to move to the Web of Data, including a demo connecting various sources all from your(!) machine.
Non-technical demonstration of grlc intended for users without prior knowledge of SPARQL or github. See grlc.io and SALAD best paper awarded research: http://datalegend.net/assets/paper7.pdf
These slides are from Auke Rijpma who presented the Catasto meets SPARQL workshop. All stuff is in beta, so let us know when something broke (twitter: @rlzijdeman)
Labour force participation of married women, US 1860-2010Richard Zijdeman
In this presentation I describe the shape of labour force participation curve of married women in the US. It is hypothesized to be U-shaped, but it appears to be more S-shaped. However, more importantly it provides an effort to test the underlying mechanisms of the U-shape at the US state level.
Advancing the comparability of occupational data through Linked Open DataRichard Zijdeman
Occupations are a crucial resource for historical research in a wide variety of fields. This presentation indicates the size of the error that is made when combining data from the two major classification schemes OCCHISCO and HISCO. Next it shows how Linked Data provides a solution to circumvent this and similar issues.
We show how rather than manually, we created an algorithm to derive labour relations automatically from IPUMS-USA data in a more efficient and detailed way, allowing for explanatory questions rather than descriptive questions
This presenations provides an outlook of what we anticipate with the structured data hub: to create linkable datasets, enhance the use of provenance, add quality flags to data, answer new questions and finally, borrow from and provide to public sources such as dbpedia
Historical occupational classification and stratification schemes (lecture)Richard Zijdeman
This is the lab session I provided for the European Historical Sample Network Summerschool, Nijmegen 2015, on why occupations are important in historical research and how we can appropriately deal with them using HISCO and HISCAM
Using HISCO and HISCAM to code and analyze occupationsRichard Zijdeman
This is the lab session I provided for the European Historical Sample Network Summerschool on why occupations are important in historical research and how we can appropriately deal with them using HISCO and HISCAM
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Introduction into R for historians (part 1: introduction)
1. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Introduction into R
Richard L. Zijdeman
28 May 2015
Richard L. Zijdeman Introduction into R
2. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
1 Quantitave research methods
2 Statistical Software
3 Introducing R vocabulary
4 Getting help
5 Installing R and RStudio
Richard L. Zijdeman Introduction into R
3. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Quantitave research methods
Richard L. Zijdeman Introduction into R
4. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Why
To answer descriptive and explanatory questions on populations
Richard L. Zijdeman Introduction into R
5. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Workflow: PTE
problem (research question)
theory (hypothesis)
empirical test . . . with loops between T-E and P-T-E
Richard L. Zijdeman Introduction into R
6. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Research Questions
descriptive (to what extent. . . )
comparative (comparing two entities)
trend (comparison over time)
explanatory (focus on mechanism at hand)
Richard L. Zijdeman Introduction into R
7. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Theory
deductive reasoning
explanans
general mechanism
condition
explanandum (hypothesis)
Richard L. Zijdeman Introduction into R
8. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Empirical test
sample vs. population
random vs. stratified samples
testing technique, e.g.:
T-test, correlation, regression
Software required for faster analysis
Richard L. Zijdeman Introduction into R
9. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Statistical Software
Richard L. Zijdeman Introduction into R
10. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
The dangers of analysing with spreadsheets
(e.g. MS Excel)
tempting to input and clean data in the same sheet
difficult to track cleaning rules
defaults mess up your data (e.g. 01200 -> 1200)
Richard L. Zijdeman Introduction into R
11. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Why use syntax (scripting)
Efficiency (really)
Quality (error checking)
Replicatability
Communication
Richard L. Zijdeman Introduction into R
12. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
R
R is open source, which is good and bad:
anybody can contribute (check, improve, create code)
free of charge
but: R depends on collective action
cannot ‘demand’ support
sprawl of packages
Richard L. Zijdeman Introduction into R
13. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
RStudio
browser for R
provides easy access to:
scripts
data
plots
manual
Richard L. Zijdeman Introduction into R
14. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Introducing R vocabulary
Richard L. Zijdeman Introduction into R
15. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
R script
* series of commands to manipulate data
* always save your script, NEVER change your data
original data + script = reproducable research
Richard L. Zijdeman Introduction into R
16. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
R Session
* contains scripts, data, functions
* can be saved 'workspace image'
* prefer not to:
+ sessions are usually cluttered
+ only useful if running script takes time
Richard L. Zijdeman Introduction into R
17. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Assignment
* 'attach' values to an object (e.g. a variable)
x <- 5
y <- 4
z <- x*y
print(z)
## [1] 20
Richard L. Zijdeman Introduction into R
18. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Assignment II
Try and imagine the potential of assignment
x <- c(4, 3, 2, 1, 0, 27, 34, 35)
# 'c' for concatenate values
y <- -1
z <- x*y
print(z)
## [1] -4 -3 -2 -1 0 -27 -34 -35
Richard L. Zijdeman Introduction into R
19. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Data.frame
basically a table
contains columns (variables)
contains rows (cases)
“flat table” in Kees’ terminology
my.df <- data.frame(x,z)
str(my.df) # show STRucture
## 'data.frame': 8 obs. of 2 variables:
## $ x: num 4 3 2 1 0 27 34 35
## $ z: num -4 -3 -2 -1 0 -27 -34 -35
Richard L. Zijdeman Introduction into R
20. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Packages and libraries
base R (core product)
additional packages
CRAN repository
spread through ‘mirrors’
choose a local, but active mirror
Github
packages not on CRAN
development versions of CRAN libraries
Richard L. Zijdeman Introduction into R
21. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Getting help
Richard L. Zijdeman Introduction into R
22. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Build-in help: “?”
?[function] / ?[package]
e.g. “?plot” or “?graphics”
check the index for user guides and vignettes
Richard L. Zijdeman Introduction into R
23. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Cran website
Manuals
R FAQ
R Journal
Richard L. Zijdeman Introduction into R
24. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Online communities
Stackoverflow
Instance of Stackexchange
Reputation based Q&A
Specific lists for packages, e.g.:
ggplot2
R-sig-mixed-models
Richard L. Zijdeman Introduction into R
25. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Asking a question Getting an answer
Search the web: others must have had this problem too
If you raise a question:
be polite
be concise
short background
replicatable example
debrief your efforts sofar
Richard L. Zijdeman Introduction into R
26. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Installing R and RStudio
Richard L. Zijdeman Introduction into R
27. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
Download R
Instructions via http://www.r-project.org
Choose a CRAN mirror
http://cran.r-project.org/mirrors.html
close, but active too!
Romania hasn’t gone (yet!)
Click on ‘Download R for Windows’
Follow usual installation procedure
Double click on R
You should now have a working session!
Close the session, do not save workspace image
Richard L. Zijdeman Introduction into R
28. Quantitave research methods
Statistical Software
Introducing R vocabulary
Getting help
Installing R and RStudio
RStudio
RStudio is found on http://www.rstudio.com
Download the version for your OS (e.g. windows)
http://www.rstudio.com/products/rstudio/download/
Install by double clicking on the downloaded file
Start RStudio by double clicking on the icon
You do not need to start R, before starting RStudio
Richard L. Zijdeman Introduction into R