The document provides instructions and code snippets for completing homework assignments 2a and 2b for CS151, which involve reading in weather data from a CSV file, cleaning the data by removing rows with missing values, calculating yearly average maximum and minimum temperatures, and writing the results to new CSV files. It includes algorithms for opening and reading the CSV, iterating through the data to clean it and calculate averages, and writing the cleaned and averaged data to output files using dictionaries.
MARGINALIZATION (Different learners in Marginalized Group
CS 151 homework2a
1. Computer Science 151
An introduction to the art of computing
Homework 2a solution, 2b info
Rudy Martinez
2. CS151 Spring 2019
Solution to Homework 2 part A
1. Create a file (homework2.py).
2. Your program should open the data file.
3. Read the file into a list.
4. When you see a missing data point in either TMax or TMin you should delete
the row.This row [1994-04-04,70, ] is missing TMin so remove it from the list or do
not add it to the list.
5. You should print a count of the rows (minus the removed rows with missing
data).
6. Write a comma separated value (CSV) file to disk (homework2.csv)
3. CS151 Spring 2019
Create a file (homework2.py).
1. Open Spyder and get the imports
# import csv to read the data file
import csv
# import datetime to handle dates
from datetime import datetime
#GLOBAL Variable
DATA = []
4. CS151 Spring 2019
Your program should open the data file.
● The number one problem was opening a file using Windows
○ Key is to know where the file is located and to have the path to the file.
○ This is where the CS Account on the lab machines makes life easier for everyone.
# Homework 2 part 1 Read in CSV
readFile = open ('Albuquerque1994to2018.csv', 'r')
reader = csv.reader(readFile)
5. CS151 Spring 2019
Read the file into a list.
● Best way to do this is a for loop
# Counter for rows
line_count = 0
# Loop over the whole data file
for row in reader:
if line_count == 0:
#Eat the header
print(row[0],'t',row[1],'t',row[2])
line_count += 1
else:
# Append the row to the DATA list
DATA.append([datetime.strptime(row[0], '%Y-%m-%d').date(), row[1], row[2]])
line_count += 1
# Homework part1 output number of rows in file
print('Number of Rows:', line_count)
6. CS151 Spring 2019
Clean Data and Count the rows to write to CSV
● For loop is how we get this done.
# Lets create the file to write to
writeFile = open('/home/rudym/Documents/Homework2.csv', 'w')
writer = csv.writer(writeFile)
# Loop over all the rows in DATA
for row in DATA:
# Now lets find empty TMAX, TMIN values and skip writing them
if row[1] != '' and row[2] != '':
# There is a value in both TMAX AND TMIN so write
writer.writerow([row[0], row[1], row[2]])
line_count_out += 1
7. CS151 Spring 2019
Stats
Raw Data file has 7935 (including header row) rows.
Your program should have wrote 7867 rows to your CSV file.
8. CS151 Spring 2019
Homework Part B
1. Create a python file and call it homework2b.py
a. This is the file you will turn in.
b. If you do not want to use your part a file you can use the one I provide.
2. Open homework2.csv (written from part A)
3. Create Dictionary
4. Average TMAX for each year
5. Average TMIN for each year
6. Write Date, TMAX average, TMIN average to Clean_Data.csv
9. CS151 Spring 2019
Part B Algorithm
1. Open Homework2.csv using a csv reader
2. Create List as global variable
a. DATA=[ ]
3. Read each line in from csv into the dictionary
a. Format the date into a datetime object
b. Cast the TMIN and TMAX into a float
4. Create Dictionary to hold summed data
a. AVGDICT = { }
5. Create variables to hold current year, sumTmin, sumTmax,count
6. For every line in DATA list
a. For each year, add to the tmin sum, and tmax sum
b. When year changes, write date, sumTmax, sumTmin to AVGDICT
7. Open csv writer
a. Write AVGDICT to Avg_Data.csv
10. CS151 Spring 2019
Average
To calculate the average for temperature sum up each of the values, then divide
by the number of days.
Average = Sum/count
How do we do this?
11. CS151 Spring 2019
Algorithm for calculating sum by year
currYear = first year in file
sumTmax = 0
sumTmin = 0
count = 0
For every row in DATA
if currYear == row[0].year
sumTmax += row[1]
sumTmin += row[2]
Count +=1
Else
Set currYear to row[0]
Set all variables back to 0
If count > 0:
avgTmax = sumTmax/count
avgTmin = sumTmin/count
Write currYear, avgTmax, avgTmin to avgDict
Write avgDict to csv file Avg_Data.csv
Date TMAX,
TMIN
2018-01-01, 53,
20
2018-01-02, 55,
21
2018-01-03, 52,
25
2018-01-04, 54,
21
2018-01-05, 56,
25
2018-01-06, 59,
41
2018-01-07, 55,
27
2018-01-08, 55,
31
2018-01-09, 60,
35
2018-01-10, 51,
12. CS151 Spring 2019
Writing Dictionary
Remember Dictionary’s take a key: value pair.
Make your key the year
Make your value a list of Tmax, Tmin
Ex. avgDict = {1994:[55,28]}