Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- 2008 Pleasure Way Excel TD Class B ... by Sunshine State RVs 1527 views
- Moving Data to and From R by Syracuse University 5963 views
- Introduction To R by Michael Driscoll 2305 views
- Introduction to R by Samuel Bosch 374 views
- First steps in RStudio by Prof. Dr. Jan Kirenz 969 views
- R at Microsoft (useR! 2016) by Revolution Analytics 2103 views

1,731 views

Published on

Part of advanced analytics course.

No Downloads

Total views

1,731

On SlideShare

0

From Embeds

0

Number of Embeds

17

Shares

0

Downloads

74

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Advanced Data Analytics: Getting Started with R Jeffrey Stanton School of Information Studies Syracuse University
- 2. Analytics: Key Steps• Learn the application domain• Locate or develop a data source or data set• Clean and preprocess data: May take 60% of effort!• Data reduction and transformation – Find useful pieces, squeeze out redundancies• Choose analytical approaches – summarize, visualize, organize, describe, explore, find patterns, predict, test, infer• Communicate the results and implications to data users• Deploy discovered knowledge in a system• Monitor and evaluate the effectiveness of the system 2
- 3. First Example: Ice Cream Consumption• We all know the domain, we have all eaten ice cream• Public data set obtained from supplement to Verbeek’s text: http://eu.wiley.com/legacy/wileychi/verbeek2ed/datasets.html• Let’s read the data into R and summarize it:ICECREAM=read.csv("[pathname]/icecream.csv",header=T)summary(ICECREAM)• What do these two R commands do? Did you get a mean of 84.6 for Income? What are “Min,” “1st Qu.” and all of those other things? 3
- 4. Metadata• There is a text file that goes with the CSV dataset: “icecream.txt”• This describes the meaning of the variables provided in the dataset; essential if we are to make sense of these data:Variable labels:cons: consumption of ice cream per head (in pints);income: average family income per week (in US Dollars);price: price of ice cream (per pint);temp: average temperature (in Fahrenheit);Time: index from 1 to 30• We also learn from the metadata that these are time series data with monthly observations from 18 March 1951 to 11 July 1953 4
- 5. “Sanity Check” Using Histograms and Boxplots• Cleaning, screening, and preprocessing is essential to ensure that you understand what your data set contains and that it does not contain garbage; it is impractical to look at every data point so we use histograms and boxplots to overview our data:hist(ICECREAM$income)boxplot(ICECREAM$income)• What is the purpose of the “$” notation in the commands above? Is there any other way of referring to these variables? 5
- 6. Interpret These Graphics 6
- 7. Explore• Perhaps a family with greater income can afford to purchase more ice cream:plot(ICECREAM$income,ICECREAM$cons)• How do you interpret a scatterplot?• Is there a pattern here?• Does our intuitive hypothesis fit the scatterplot?• What else could scatterplots show? 7
- 8. More Tools to Support Explorationresults=lm(ICECREAM$cons~ICECREAM$temp)# This is a comment line# The previous command calculates a line# that best fits the scatterplot with temp# on the X axis and cons on the Y axisplot(ICECREAM$temp,ICECREAM$cons)abline(results) # Plots the best fit line# The new data structure “results” has# lots of information about the analysis.# What does this list contain:results$residuals 8
- 9. What is the effect of time on these data?plot(ICECREAM$time,ICECREAM$temp)plot(ICECREAM$time,ICECREAM$cons)• What do these plots show? Can you explain why these are shaped the way they are?• Based on your answer to the previous question, how does the situation affect your strategies for understanding ice cream consumption? 9
- 10. Demonstrating Mastery• Find a small numeric dataset; try starting at the Journal of Statistical Education data website: http://www.amstat.org/publications/jse/jse_data_archive.htm• Read the dataset into R• Summarize the variables in that dataset• Use histograms and boxplots to check and understand your data; use the metadata description that came with the dataset to make sure that you know the variables• Explore the data using plot; look for something interesting• Put your findings in a slide and communicate them to me or someone else 10

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment