This document provides instructions for a two-part data visualization assignment using RStudio. For part 1, students will analyze a unique dataset by exploring its structure, calculating summary statistics, and visualizing correlations. Screenshots of the analysis must be provided. For part 2, students will create various visualizations of the data, including pie charts, bar plots, histograms, box plots, and scatter plots. Screenshots of the visualizations must be submitted along with an APA style cover page in one Word document.
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Analyze and Visualize Data with RStudio
1. Background: This course is all about data visualization.
However, we must first have some understanding about the data
that we are using to create the visualizations. For this
assignment, each group will be given its unique dataset to work
with. That same dataset will be used for both part 1 and part 2
of this assignment.
Your Assignment
:
Part 1 - Data Analysis with RStudio
Provide screen shots that show analysis of your dataset. For
each screen shot, please show
comment lines
that describes what the next line(s) of code is to achieve,
the code
in proper syntax for R
,
and the
computed results
that R produces.
Analyzing your data:Watch the video included in this week's
Residency material to learn the simple commands to conduct
basic data analysis with RStudio.Use RStudio to generate
results - create screen shots and then paste to a MS word
document with the basic data analysis of your dataset.
Remember to use a comment line (#) that explains each R
instruction
. Example: (#sets the working directory). Commands (
setwd
,
dim
,
2. head
,
tail
,
structure
,
summary, cor, transform, subset).
First, set your working directory (command - setwd OR use
drop down from RStudio
Session
tab.
Load your dataset into RStudio and examine its structure -
read.csv OR select your object file from RStudio
Files
pane. Other commands to use:
dim
,
head
,
tail
,
structure
, and
summary
(provide comment lines; the R code; and results as screen shot
#1)
View your original dataset - examine each field/grouping in the
data - decide whether each field is: "categorical" or
"continuous" data (add this also to screen shot #1)
Create a correlation of stats for the dataset. R requires
categorical fields to be 0/1 instead of no/yes; also, fields must
3. be numeric instead of string - Hint: might be necessary to
Transform some fields. If so, create a new version of your
dataset with these transformations then do correlation on
transformed data - commands:
transform
and
cor
(provide as screen shot #2)
What is the Min, Max, Median, and Mean of
a continuous value field
in your data? (provide also as screen shot #2)
What is the correlation values between all fields in your
dataset? (provide as screen shot #3)
Create a subset of the dataset with only at least two field in
your dataset - commands:
subset
,
cor
(provide also as screen shot #3)
These three (3) screen shots containing the required data details
should be placed in a MS Word document and labeled as Part 1
- Dataset Analysis .
Part 2 - Data Visualizing with RStudio
Background: As we have learned, a lot of thought goes into the
design of a visualization. In this examination of your data and
its visualization, we review how
data types
influence the choice of graphing - see "Selecting a Graph"
hand-out (in this folder).Provide screen shots that shows graphs
4. and charts of your dataset (Do NOT use ggplot2 or other R
package features - we will learn and use these advance R
features in another lesson) For each screen shot, please show
comment lines
that describes what the next line(s) of code is to achieve,
the code
in proper syntax for R
,
and the
computed results
that R produces.Visualizing your data:Review Kirk chapter 4
and Res Wknd slide hand-outs to learn the data type
requirements for each graph type. Also use this R Tutorial
page:
https://www.tutorialspoint.com/r/index.htm
for reference on RStudio commands for creating graphs and
charts.Use RStudio to create graphs and charts - create screen
shots and then paste to your MS word document showing visuals
of your dataset. Use c ommands (
pie
,
barplot
,
hist
,
boxplot
,
plot).
Graphs to Produce:
Pie Chart:
Create a pie chart that shows relationships of certain
fields/grouping of your dataset - see professor for details. Use
command:
pie
5. (x)
-
(provide as screen shot #4)
Label the fields/columns as appropriate - see professor for
details, Use command
pie (
x), labels. (provide also as screen shot #4)
Title the pie chart as (a name you choose). Use command
pie (
x), labels, main. (provide as screen shot #5)
Color the pie chart using the rainbow option. Use command
pie (
x), labels, main, col. (provide also as screen shot #5.
Bar Plot:
Create a bar plot that shows relationships of certain
fields/grouping of your dataset - Use same previous
fields/columns First create a matrix (H); assign values for each
field/column to (H). Use command
barplot
(H). (provide as screen shot #4)
Label the x and y axis as (see professor). Use command
barplot
(H), xlab, ylab. (provide also as screen shot #4)
Label the x and y axis with names (see professor). Use
command
barplot
(H), xlab =, ylab =. (provide also as screen shot #4).
6. Title the bar plot as (a name you choose). Use command
barplot
(H), xlab =, ylab =, main. (provide as screen shot #5).
Color the bars in the bar plot any color you wish. Use
command
barplot
(H), xlab =, ylab =, main, col. (provide also as screen shot
#5).
Histogram:
Create a histogram that shows frequency of values of chosen
fields/columns of your dataset - use same previous r
fields/columns. First, create a vector (v) that has values for
values of each field/column. then use function
hist
(v). (provide as screen shot #6)
Label the x and y axis as (same as previous bar plot). Use
function
hist
(v, xlab =, xlim =, ylab =, ylim =. (provide also as screen shot
#6)
Title the histogram as (same as previous bar plot). Assign your
title to a variable;
title <- histogram name
, Use function
hist
(v, main = "title", xlab =, xlim =, ylab =, ylim =. (provide
as screen shot #7)
7. Give the histogram any color you wish.
Note: all bars should be the same color
. Use function
hist
(v, main = "title", xlab =, col =, xlim =, ylab =, ylim =.
(provide also as screen shot #7)
Box Plot:
Create a box plot that shows a measure of the distribution of
values across chosen fields/columns of your dataset - use same
previous fields/columns. First, create a vector (v) that has
values for values of each field/column. then use function
boxplot
(v). (provide as screen shot #8)
Label the x and y axis as (same as previous histogram). Use
function
boxplot
(v, xlab =, ylab =, (provide also as screen shot #8)
Title the box plot as ( a name you choose). Use function
boxplot
(v, main=, xlab =, ylab =, (provide also as screen shot #8)
Color the box plot any color you wish. Use function
boxplot
( v, main=, xlab =, ylab =, col =. (provide also as screen
shot #8)
Scatter Plot:
8. Create a scatter plot that shows many points of fields/columns
of your dataset plotted in a Cartesian plain - use same previous
fields/columns. First, create two variables for fields - for the
horizontal coordinate (hw) and vertical (vw) for the vertical
coordinate. then use function
p
lot
(vw,hw). (provide as screen shot #9).
Create a scatter plot of just two of the fields/columns of your
dataset
Choose one field/column of your dataset and plot that with a
label the x coordinate
Add a label to the y axis of this same field/column
Add a Titleto the scatter plot (as you choose)
Color the scatter any color you wish (your choice).
These screen shots containing graphs and charts of your data
should also be placed in the same MS Word document and
labeled as Part 2 - Dataset Visualizing with RStudio .
You should have
one MS Word document
that shows both part 1 and part 2 as this assignment. Your
deliverable includes both parts of this assignment; it also
includes your cover page in APA style showing: Title of this
project; Group color and list of members, University’s name,
Course name, Course number, Professor’s name, and Date.
Although this is work done in your Group,
each learner must post an individual copy to iLearn for grade.