SlideShare a Scribd company logo
1 of 10
Advanced Data Analytics:
 Getting Started with R

         Jeffrey Stanton
  School of Information Studies
      Syracuse University
Analytics: Key Steps
• Learn the application domain
• Locate or develop a data source or data set
• Clean and preprocess data: May take 60% of effort!
• Data reduction and transformation
   – Find useful pieces, squeeze out redundancies
• Choose analytical approaches
   – summarize, visualize, organize, describe, explore, find
     patterns, predict, test, infer
• Communicate the results and implications to data users
• Deploy discovered knowledge in a system
• Monitor and evaluate the effectiveness of the system
                                                               2
First Example: Ice Cream Consumption
• We all know the domain, we have all eaten ice cream
• Public data set obtained from supplement to Verbeek’s text:
  http://eu.wiley.com/legacy/wileychi/verbeek2ed/datasets.html
• Let’s read the data into R and summarize it:

ICECREAM=read.csv("[pathname]/icecream.csv",header=T)
summary(ICECREAM)


• What do these two R commands do? Did you get a mean of
  84.6 for Income? What are “Min,” “1st Qu.” and all of those
  other things?

                                                                 3
Metadata
• There is a text file that goes with the CSV dataset:
  “icecream.txt”
• This describes the meaning of the variables provided in the
  dataset; essential if we are to make sense of these data:
Variable labels:

cons:         consumption of ice cream per head (in pints);
income:        average family income per week (in US Dollars);
price:        price of ice cream (per pint);
temp:         average temperature (in Fahrenheit);
Time:           index from 1 to 30

• We also learn from the metadata that these are time series
  data with monthly observations from 18 March 1951 to 11
  July 1953
                                                               4
“Sanity Check” Using Histograms and Boxplots

• Cleaning, screening, and preprocessing is essential to ensure
  that you understand what your data set contains and that it
  does not contain garbage; it is impractical to look at every
  data point so we use histograms and boxplots to overview
  our data:

hist(ICECREAM$income)
boxplot(ICECREAM$income)

• What is the purpose of the “$” notation in the commands
  above? Is there any other way of referring to these
  variables?
                                                             5
Interpret These Graphics




                           6
Explore
• Perhaps a family with greater income can afford to purchase
  more ice cream:

plot(ICECREAM$income,ICECREAM$cons)


• How do you interpret a
  scatterplot?
• Is there a pattern here?
• Does our intuitive hypothesis
  fit the scatterplot?
• What else could scatterplots
  show?
                                                           7
More Tools to Support Exploration
results=lm(ICECREAM$cons~ICECREAM$temp)
# This is a comment line
# The previous command calculates a line
# that best fits the scatterplot with temp
# on the X axis and cons on the Y axis

plot(ICECREAM$temp,ICECREAM$cons)
abline(results) # Plots the best fit line

# The new data structure “results” has
# lots of information about the analysis.
# What does this list contain:
results$residuals

                                             8
What is the effect of time on these data?
plot(ICECREAM$time,ICECREAM$temp)
plot(ICECREAM$time,ICECREAM$cons)

• What do these plots show? Can you explain why these are
  shaped the way they are?
• Based on your answer to the previous question, how does
  the situation affect your strategies for understanding ice
  cream consumption?




                                                               9
Demonstrating Mastery
• Find a small numeric dataset; try starting at the Journal of
  Statistical Education data website:
  http://www.amstat.org/publications/jse/jse_data_archive.htm
• Read the dataset into R
• Summarize the variables in that dataset
• Use histograms and boxplots to check and understand your
  data; use the metadata description that came with the dataset
  to make sure that you know the variables
• Explore the data using plot; look for something interesting
• Put your findings in a slide and communicate them to me or
  someone else

                                                                 10

More Related Content

Viewers also liked

Viewers also liked (9)

Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudio
 
Language R
Language RLanguage R
Language R
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 

Similar to Getting Started with R

20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vsIan Feller
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSubrata Saharia
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxAnusuya123
 
Data and Information Details and Differences
Data and Information Details and DifferencesData and Information Details and Differences
Data and Information Details and DifferencesSaurabh846965
 
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja AjmeraData Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja AjmeraPooja Ajmera
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huwekineheshete
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptxBillyMoses1
 
BAEB601 Chapter 4: Findings, Analysis, and SPSS
BAEB601 Chapter 4: Findings, Analysis, and SPSSBAEB601 Chapter 4: Findings, Analysis, and SPSS
BAEB601 Chapter 4: Findings, Analysis, and SPSSDr Nur Suhaili Ramli
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 

Similar to Getting Started with R (20)

Metopen 6
Metopen 6Metopen 6
Metopen 6
 
Daming
DamingDaming
Daming
 
EDA
EDAEDA
EDA
 
data mining
data miningdata mining
data mining
 
Part1
Part1Part1
Part1
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs
 
BAS 250 Lecture 2
BAS 250 Lecture 2BAS 250 Lecture 2
BAS 250 Lecture 2
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Lec 3.pptx
Lec 3.pptxLec 3.pptx
Lec 3.pptx
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
 
Data and Information Details and Differences
Data and Information Details and DifferencesData and Information Details and Differences
Data and Information Details and Differences
 
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja AjmeraData Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
Data Science and Data Visualization (All about Data Analysis) by Pooja Ajmera
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptx
 
BAEB601 Chapter 4: Findings, Analysis, and SPSS
BAEB601 Chapter 4: Findings, Analysis, and SPSSBAEB601 Chapter 4: Findings, Analysis, and SPSS
BAEB601 Chapter 4: Findings, Analysis, and SPSS
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 

More from Syracuse University

Basic SEVIS Overview for U.S. University Faculty
Basic SEVIS Overview for U.S. University FacultyBasic SEVIS Overview for U.S. University Faculty
Basic SEVIS Overview for U.S. University FacultySyracuse University
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
 
Carma internet research module scale development
Carma internet research module   scale developmentCarma internet research module   scale development
Carma internet research module scale developmentSyracuse University
 
Carma internet research module getting started with question pro
Carma internet research module   getting started with question proCarma internet research module   getting started with question pro
Carma internet research module getting started with question proSyracuse University
 
Carma internet research module visual design issues
Carma internet research module   visual design issuesCarma internet research module   visual design issues
Carma internet research module visual design issuesSyracuse University
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics CourseSyracuse University
 
Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)Syracuse University
 
Carma internet research module: Future data collection
Carma internet research module: Future data collectionCarma internet research module: Future data collection
Carma internet research module: Future data collectionSyracuse University
 
Carma internet research module: Sampling for internet
Carma internet research module: Sampling for internetCarma internet research module: Sampling for internet
Carma internet research module: Sampling for internetSyracuse University
 

More from Syracuse University (20)

Discovery informaticsstanton
Discovery informaticsstantonDiscovery informaticsstanton
Discovery informaticsstanton
 
Basic SEVIS Overview for U.S. University Faculty
Basic SEVIS Overview for U.S. University FacultyBasic SEVIS Overview for U.S. University Faculty
Basic SEVIS Overview for U.S. University Faculty
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
Chapter9 r studio2
Chapter9 r studio2Chapter9 r studio2
Chapter9 r studio2
 
Basic Overview of Data Mining
Basic Overview of Data MiningBasic Overview of Data Mining
Basic Overview of Data Mining
 
Strategic planning
Strategic planningStrategic planning
Strategic planning
 
Carma internet research module scale development
Carma internet research module   scale developmentCarma internet research module   scale development
Carma internet research module scale development
 
Carma internet research module getting started with question pro
Carma internet research module   getting started with question proCarma internet research module   getting started with question pro
Carma internet research module getting started with question pro
 
Carma internet research module visual design issues
Carma internet research module   visual design issuesCarma internet research module   visual design issues
Carma internet research module visual design issues
 
Siop impact of social media
Siop impact of social mediaSiop impact of social media
Siop impact of social media
 
Basic Graphics with R
Basic Graphics with RBasic Graphics with R
Basic Graphics with R
 
R-Studio Vs. Rcmdr
R-Studio Vs. RcmdrR-Studio Vs. Rcmdr
R-Studio Vs. Rcmdr
 
Introduction to Advance Analytics Course
Introduction to Advance Analytics CourseIntroduction to Advance Analytics Course
Introduction to Advance Analytics Course
 
Installing R and R-Studio
Installing R and R-StudioInstalling R and R-Studio
Installing R and R-Studio
 
Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)Mining tweets for security information (rev 2)
Mining tweets for security information (rev 2)
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Reducing Response Burden
Reducing Response BurdenReducing Response Burden
Reducing Response Burden
 
PACIS Survey Workshop
PACIS Survey WorkshopPACIS Survey Workshop
PACIS Survey Workshop
 
Carma internet research module: Future data collection
Carma internet research module: Future data collectionCarma internet research module: Future data collection
Carma internet research module: Future data collection
 
Carma internet research module: Sampling for internet
Carma internet research module: Sampling for internetCarma internet research module: Sampling for internet
Carma internet research module: Sampling for internet
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 

Recently uploaded (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 

Getting Started with R

  • 1. Advanced Data Analytics: Getting Started with R Jeffrey Stanton School of Information Studies Syracuse University
  • 2. Analytics: Key Steps • Learn the application domain • Locate or develop a data source or data set • Clean and preprocess data: May take 60% of effort! • Data reduction and transformation – Find useful pieces, squeeze out redundancies • Choose analytical approaches – summarize, visualize, organize, describe, explore, find patterns, predict, test, infer • Communicate the results and implications to data users • Deploy discovered knowledge in a system • Monitor and evaluate the effectiveness of the system 2
  • 3. First Example: Ice Cream Consumption • We all know the domain, we have all eaten ice cream • Public data set obtained from supplement to Verbeek’s text: http://eu.wiley.com/legacy/wileychi/verbeek2ed/datasets.html • Let’s read the data into R and summarize it: ICECREAM=read.csv("[pathname]/icecream.csv",header=T) summary(ICECREAM) • What do these two R commands do? Did you get a mean of 84.6 for Income? What are “Min,” “1st Qu.” and all of those other things? 3
  • 4. Metadata • There is a text file that goes with the CSV dataset: “icecream.txt” • This describes the meaning of the variables provided in the dataset; essential if we are to make sense of these data: Variable labels: cons: consumption of ice cream per head (in pints); income: average family income per week (in US Dollars); price: price of ice cream (per pint); temp: average temperature (in Fahrenheit); Time: index from 1 to 30 • We also learn from the metadata that these are time series data with monthly observations from 18 March 1951 to 11 July 1953 4
  • 5. “Sanity Check” Using Histograms and Boxplots • Cleaning, screening, and preprocessing is essential to ensure that you understand what your data set contains and that it does not contain garbage; it is impractical to look at every data point so we use histograms and boxplots to overview our data: hist(ICECREAM$income) boxplot(ICECREAM$income) • What is the purpose of the “$” notation in the commands above? Is there any other way of referring to these variables? 5
  • 7. Explore • Perhaps a family with greater income can afford to purchase more ice cream: plot(ICECREAM$income,ICECREAM$cons) • How do you interpret a scatterplot? • Is there a pattern here? • Does our intuitive hypothesis fit the scatterplot? • What else could scatterplots show? 7
  • 8. More Tools to Support Exploration results=lm(ICECREAM$cons~ICECREAM$temp) # This is a comment line # The previous command calculates a line # that best fits the scatterplot with temp # on the X axis and cons on the Y axis plot(ICECREAM$temp,ICECREAM$cons) abline(results) # Plots the best fit line # The new data structure “results” has # lots of information about the analysis. # What does this list contain: results$residuals 8
  • 9. What is the effect of time on these data? plot(ICECREAM$time,ICECREAM$temp) plot(ICECREAM$time,ICECREAM$cons) • What do these plots show? Can you explain why these are shaped the way they are? • Based on your answer to the previous question, how does the situation affect your strategies for understanding ice cream consumption? 9
  • 10. Demonstrating Mastery • Find a small numeric dataset; try starting at the Journal of Statistical Education data website: http://www.amstat.org/publications/jse/jse_data_archive.htm • Read the dataset into R • Summarize the variables in that dataset • Use histograms and boxplots to check and understand your data; use the metadata description that came with the dataset to make sure that you know the variables • Explore the data using plot; look for something interesting • Put your findings in a slide and communicate them to me or someone else 10

Editor's Notes

  1. The other way is to ATTACH() the ICECREAM data structure. Then you can refer to the variable names directly.