SlideShare a Scribd company logo
1 of 61
Building flexible tools to store
sums and report on CSV data
Presented by
Margery Harrison
Audience level: Novice
09:45 AM - 10:45 AM
August 17, 2014
Room 704
Python Flexibility
● Basic, Fortran, C, Pascal, Javascript,...
● At some point, there's a tendency to think
the same way, and just translate it
● You can write Python as if it were C
● Or you can take advantage of Python's
special data structures.
● The second option is a lot more fun.
Using Python data structures to
report on CSV data
● Lists
● Sets
● Tuples
● Dictionaries
● CSV Reader
● DictReader
● Counter
Also,
● Using tuples as dictionary keys
● Using enumerate() to count how many
times you've looped
– See “Loop like a Native”
http://nedbatchelder.com/text/iter.html
Code Development Method
● Start with simplest possible version
● Test and validate
● Iterative improvements
– Make it prettier
– Make it do more
– Make it more general
This is a CSV file
color,size,shape,number
red,big,square,3
blue,big,triangle,5
green,small,square,2
blue,small,triangle,1
red,big,square,7
blue,small,triangle,3
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
http://i239.photobucket.com/albums/ff263/peacelovebones/two-
pandas-rolling-1.jpg
CSV DictReader
>>> import csv
>>> import os
>>> with open("simpleCSV.txt") as f:
... r=csv.DictReader(f)
... for row in r:
... print row
...
Running DictReader
DictReader is sequential
Tabulate All Possible Values
How many of each?
● It's nice to have a listing that shows the
variety of objects that can appear in each
column.
● Next, we'd like to count how many of each
● And guess what? Python has a special data
structure for that.
collections.Counter
Playing with Counters
Index into Counters
Counter + DictReader
Let's use counters to tell us how many of each
value was in each column.
Print number of each value
Output
color
blue : 3
green : 1
red : 2
shape
square : 3
triangle: 3
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 3
big : 3
You might ask, why not this?
for row in r:
for head in r.fieldnames:
field_value = row[head]
possible_values[head].add(field_value)
#count_of_values.update(row[head])
count_of_values.update(field_value)
print count_of_values
Because
Counter({'e': 13, 'l': 12, 'a': 9, 'r': 9, 'g': 7, 'b': 6, 'i': 6, 's':
6, 'u': 6, 'n': 4, 'm': 3, 'q': 3, 't': 3, 'd': 2, '3': 2, '1': 1, '2':
1, '7': 1, '5': 1})
color
blue : 0
green : 0
red : 0
shape
square : 0
triangle: 0
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 0
big : 0
Output
color
blue : 3
green : 1
red : 2
shape
square : 3
triangle: 3
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 3
big : 3
How many red squares?
● We can use tuples as an index into the
counter
– (red,square)
– (big,red,square)
– (small,blue,triangle)
– (small,square)
Let's use a simpler CSV
color,size,shape
red,big,square
blue,big,triangle
green,small,square
blue,small,triangle
red,big,square
blue,small,triangle
Counting Tuples
trying to use magic update()
>>> c=collections.Counter([('a,b'),('c,d,e')])
>>> c
Counter({'a,b': 1, 'c,d,e': 1})
>>> c.update(('a','b'))
>>> c
Counter({'a': 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
>>> c.update((('a','b'),))
>>> c
Counter({'a': 1, ('a', 'b'): 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
Oh well
>>> c.update([(('a','b'),)])
>>> c
Counter({'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1,
('a', 'b'): 1})
>>> c[('a','b')]
1
>>> c[('a','b')]+=5
>>> c
Counter({('a', 'b'): 6, 'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e':
1, 'a,b': 1})
Combo Count Part 1: Initialize
Combo Count 2: Counting
Combo Count 3: Printing
Combo Count Output
color
blue : 3
3 blue in 1 combinations:
('blue', 'big'): 1
('blue', 'small'): 2
3 blue in 2 combinations:
('blue', 'big', 'triangle'): 1
('blue', 'small', 'triangle'): 2
green : 1
1 green in 1 combinations:
('green', 'small'): 1
1 green in 2 combinations:
('green', 'small', 'square'): 1
red : 2
2 red in 1 combinations:
('red', 'big'): 2
2 red in 2 combinations:
('red', 'big', 'square'): 2
shape
square : 3
3 square in 1 combinations:
3 square in 2 combinations:
('red', 'big', 'square'): 2
('green', 'small', 'square'): 1
triangle: 3
3 triangle in 1 combinations:
3 triangle in 2 combinations:
('blue', 'big', 'triangle'): 1
('blue', 'small', 'triangle'): 2
size
small : 3
3 small in 1 combinations:
('blue', 'small'): 2
('green', 'small'): 1
3 small in 2 combinations:
('green', 'small', 'square'): 1
('blue', 'small', 'triangle'): 2
big : 3
3 big in 1 combinations:
('blue', 'big'): 1
('red', 'big'): 2
3 big in 2 combinations:
('red', 'big', 'square'): 2
('blue', 'big', 'triangle'):
1
Well, that's ugly
● We need to make it prettier
● We need to write out to a file
● We need to break things up into Classes
Printing Combination Levels
Number of Squares
Number of Red Squares
Number of Blue Squares
Number of Triangles
Number of Red Triangles
Number of Blue Triangles
Total Red
Total Blue
Indentation per level
● If we're indexing by tuple, then the
indentation level could correspond to the
number of items in the tuple.
● Let's have general methods to format the
indentation level, given the number of
items in the tuple, or input 'level' integer
A class write_indent() method
If part of class with counter and msgs dict,
just pass in the tuple:
def write_indent(self, tup_index):
'''
:param tup_index: tuple index into counter
'''
indent = ' ' * len(tup_index)
msg = self.msgs[tup_index]
sum = self.counts[tup_index]
indented_msg = ('{0:s}{1:s}'.format(
indent, msg, sum)
class-less indent_message()
def indent_message(level, msg, sum,
space_per_indent=2, space=' '):
num_spaces = self.space_per_indent * level
indent = space * num_spaces
# We'll want to tune the formatting..
indented_msg = ('{0:s}{1:s}:{2:d}'.format(
indent, msg, sum)
return indented_msg
Adjustable field widths
Depending on data, we'll want different
field widths
red squares 5
Blue squares 21
Large Red Squares in the Bronx 987654321
Using format to format a format
string
>>> f='{{0:{0:d}s}}'.format(3)
>>> f
'{0:3s}'
>>> f='{{0:{0:d}s}}{{1:{1:d}d}}'.format(3,5)
>>> f
'{0:3s}{1:5d}'
>>> f='{{0:s}}{{1:{0:d}s}}{{2:{1:d}d}}'.format(3,5)
>>> f
'{0:s}{1:3s}{2:5d}'
Format 3 values
● Our formatting string will print 3 values:
– String of space chars: {0:s}
– Message: {1:[msg_width]s}
– Sum: Right justified {2:-[sum_width]d}
Class For Flexible Indentation
Flexible Indent Class Variables
Flexible Indent Method
Testing IndentMessages class
SimpleCSVReporter
● Open a CSV File
● Create
– Set of possible values
– Set of possible tuples
– Counter indexed by each value & tuple
● Use IndentMessages to format output lines
SimpleCSVReporter class vars
readCSV() begins
initialize sets..
readCSV() continued:
Loop to collect & sum
Write to Report File
Using recursion for limitless
indentation
Recursive print sub-levels
Word transform stubs
General method to test
Test with simpler CSV
Output for simpler CSV
A bigger CSV file
"CCN","REPORTDATETIME","SHIFT","OFFENSE","METHOD","BLOCKSIT
EADDRESS","WARD","ANC","DISTRICT","PSA","NEIGHBORHOODCL
USTER","BUSINESSIMPROVEMENTDISTRICT","VOTING_PRECINCT",
"START_DATE","END_DATE"
4104147,"4/16/2013 12:00:00
AM","MIDNIGHT","HOMICIDE","KNIFE","1500 - 1599 BLOCK OF 1ST
STREET SW",6,"6D","FIRST",105,9,,"Precinct 127","7/27/2004 8:30:00
PM","7/27/2004 8:30:00 PM"
5047867,"6/5/2013 12:00:00 AM","MIDNIGHT","SEX ABUSE","KNIFE","6500
- 6599 BLOCK OF PINEY BRANCH ROAD
NW",4,"4B","FOURTH",402,17,,"Precinct 59","4/15/2005 12:30:00 PM",
● From http://data.octo.dc.gov/
Deleted all but 4 columns
"SHIFT","OFFENSE","METHOD","DISTRICT"
"MIDNIGHT","HOMICIDE","KNIFE","FIRST"
"MIDNIGHT","SEX ABUSE","KNIFE","FOURTH"
...
"DAY","THEFT/OTHER","OTHERS","SECOND"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"EVENING","BURGLARY","OTHERS","FIFTH"
...
Method to run crime report
Output - top
Output - bottom
Improvements
● Allow user-specified order for values, e.g.
FIRST, SECOND, THIRD
● Other means of tabulating
● Keeping track of blank values
● Summing counts in columns
● ...
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
Links
This talk: http://www.slideshare.net/pargery/mnh-csv-python
● https://github.com/pargery/csv_utils2
● Also some notes in http://margerytech.blogspot.com/
Info on Data Structures
● http://rhodesmill.org/brandon/slides/2014-04-pycon/data-structures/
● http://nedbatchelder.com/text/iter.html
DC crime stats
● http://data.octo.dc.gov/
“The data made available here has been modified for use from its original source, which is the Government of the
District of Columbia. Neither the District of Columbia Government nor the Office of the Chief Technology Officer
(OCTO) makes any claims as to the completeness, accuracy or content of any data contained in this application;
makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a
particular use; nor are any such warranties to be implied or inferred with respect to the information or data
furnished herein. The data is subject to change as modifications and updates are complete. It is understood that
the information contained in the web feed is being used at one's own risk."

More Related Content

Similar to Building flexible tools to report on CSV data

It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.Alex Powers
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NOllieShoresna
 
“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”diannepatricia
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekendsDe-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekendsDSCUSICT
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Nima Sarshar
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwaresDr.ammara khakwani
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxgerardkortney
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...James Nelson
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptxDevikaRaj14
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptxDevikaRaj14
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxJocelynBadua2
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxJocelynBadua2
 
Interpolation Missing values.pptx
Interpolation Missing values.pptxInterpolation Missing values.pptx
Interpolation Missing values.pptxRushikeshGore18
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RAbhirup Mallik
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdfSudhanshiBakre1
 
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)Yogi Sharo
 
Assignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docxAssignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docxfredharris32
 

Similar to Building flexible tools to report on CSV data (20)

It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
 
“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekendsDe-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docx
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptx
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptx
 
Interpolation Missing values.pptx
Interpolation Missing values.pptxInterpolation Missing values.pptx
Interpolation Missing values.pptx
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
 
Assignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docxAssignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docx
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 

Recently uploaded

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 

Recently uploaded (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

Building flexible tools to report on CSV data

  • 1. Building flexible tools to store sums and report on CSV data Presented by Margery Harrison Audience level: Novice 09:45 AM - 10:45 AM August 17, 2014 Room 704
  • 2. Python Flexibility ● Basic, Fortran, C, Pascal, Javascript,... ● At some point, there's a tendency to think the same way, and just translate it ● You can write Python as if it were C ● Or you can take advantage of Python's special data structures. ● The second option is a lot more fun.
  • 3. Using Python data structures to report on CSV data ● Lists ● Sets ● Tuples ● Dictionaries ● CSV Reader ● DictReader ● Counter
  • 4. Also, ● Using tuples as dictionary keys ● Using enumerate() to count how many times you've looped – See “Loop like a Native” http://nedbatchelder.com/text/iter.html
  • 5. Code Development Method ● Start with simplest possible version ● Test and validate ● Iterative improvements – Make it prettier – Make it do more – Make it more general
  • 6. This is a CSV file color,size,shape,number red,big,square,3 blue,big,triangle,5 green,small,square,2 blue,small,triangle,1 red,big,square,7 blue,small,triangle,3
  • 9. CSV DictReader >>> import csv >>> import os >>> with open("simpleCSV.txt") as f: ... r=csv.DictReader(f) ... for row in r: ... print row ...
  • 13. How many of each? ● It's nice to have a listing that shows the variety of objects that can appear in each column. ● Next, we'd like to count how many of each ● And guess what? Python has a special data structure for that.
  • 17. Counter + DictReader Let's use counters to tell us how many of each value was in each column.
  • 18. Print number of each value
  • 19. Output color blue : 3 green : 1 red : 2 shape square : 3 triangle: 3 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 3 big : 3
  • 20. You might ask, why not this? for row in r: for head in r.fieldnames: field_value = row[head] possible_values[head].add(field_value) #count_of_values.update(row[head]) count_of_values.update(field_value) print count_of_values
  • 21. Because Counter({'e': 13, 'l': 12, 'a': 9, 'r': 9, 'g': 7, 'b': 6, 'i': 6, 's': 6, 'u': 6, 'n': 4, 'm': 3, 'q': 3, 't': 3, 'd': 2, '3': 2, '1': 1, '2': 1, '7': 1, '5': 1}) color blue : 0 green : 0 red : 0 shape square : 0 triangle: 0 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 0 big : 0
  • 22. Output color blue : 3 green : 1 red : 2 shape square : 3 triangle: 3 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 3 big : 3
  • 23. How many red squares? ● We can use tuples as an index into the counter – (red,square) – (big,red,square) – (small,blue,triangle) – (small,square)
  • 24. Let's use a simpler CSV color,size,shape red,big,square blue,big,triangle green,small,square blue,small,triangle red,big,square blue,small,triangle
  • 25. Counting Tuples trying to use magic update() >>> c=collections.Counter([('a,b'),('c,d,e')]) >>> c Counter({'a,b': 1, 'c,d,e': 1}) >>> c.update(('a','b')) >>> c Counter({'a': 1, 'b': 1, 'a,b': 1, 'c,d,e': 1}) >>> c.update((('a','b'),)) >>> c Counter({'a': 1, ('a', 'b'): 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
  • 26. Oh well >>> c.update([(('a','b'),)]) >>> c Counter({'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1, ('a', 'b'): 1}) >>> c[('a','b')] 1 >>> c[('a','b')]+=5 >>> c Counter({('a', 'b'): 6, 'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1})
  • 27. Combo Count Part 1: Initialize
  • 28. Combo Count 2: Counting
  • 29. Combo Count 3: Printing
  • 30. Combo Count Output color blue : 3 3 blue in 1 combinations: ('blue', 'big'): 1 ('blue', 'small'): 2 3 blue in 2 combinations: ('blue', 'big', 'triangle'): 1 ('blue', 'small', 'triangle'): 2 green : 1 1 green in 1 combinations: ('green', 'small'): 1 1 green in 2 combinations: ('green', 'small', 'square'): 1 red : 2 2 red in 1 combinations: ('red', 'big'): 2 2 red in 2 combinations: ('red', 'big', 'square'): 2 shape square : 3 3 square in 1 combinations: 3 square in 2 combinations: ('red', 'big', 'square'): 2 ('green', 'small', 'square'): 1 triangle: 3 3 triangle in 1 combinations: 3 triangle in 2 combinations: ('blue', 'big', 'triangle'): 1 ('blue', 'small', 'triangle'): 2 size small : 3 3 small in 1 combinations: ('blue', 'small'): 2 ('green', 'small'): 1 3 small in 2 combinations: ('green', 'small', 'square'): 1 ('blue', 'small', 'triangle'): 2 big : 3 3 big in 1 combinations: ('blue', 'big'): 1 ('red', 'big'): 2 3 big in 2 combinations: ('red', 'big', 'square'): 2 ('blue', 'big', 'triangle'): 1
  • 31. Well, that's ugly ● We need to make it prettier ● We need to write out to a file ● We need to break things up into Classes
  • 32. Printing Combination Levels Number of Squares Number of Red Squares Number of Blue Squares Number of Triangles Number of Red Triangles Number of Blue Triangles Total Red Total Blue
  • 33. Indentation per level ● If we're indexing by tuple, then the indentation level could correspond to the number of items in the tuple. ● Let's have general methods to format the indentation level, given the number of items in the tuple, or input 'level' integer
  • 34. A class write_indent() method If part of class with counter and msgs dict, just pass in the tuple: def write_indent(self, tup_index): ''' :param tup_index: tuple index into counter ''' indent = ' ' * len(tup_index) msg = self.msgs[tup_index] sum = self.counts[tup_index] indented_msg = ('{0:s}{1:s}'.format( indent, msg, sum)
  • 35. class-less indent_message() def indent_message(level, msg, sum, space_per_indent=2, space=' '): num_spaces = self.space_per_indent * level indent = space * num_spaces # We'll want to tune the formatting.. indented_msg = ('{0:s}{1:s}:{2:d}'.format( indent, msg, sum) return indented_msg
  • 36. Adjustable field widths Depending on data, we'll want different field widths red squares 5 Blue squares 21 Large Red Squares in the Bronx 987654321
  • 37. Using format to format a format string >>> f='{{0:{0:d}s}}'.format(3) >>> f '{0:3s}' >>> f='{{0:{0:d}s}}{{1:{1:d}d}}'.format(3,5) >>> f '{0:3s}{1:5d}' >>> f='{{0:s}}{{1:{0:d}s}}{{2:{1:d}d}}'.format(3,5) >>> f '{0:s}{1:3s}{2:5d}'
  • 38. Format 3 values ● Our formatting string will print 3 values: – String of space chars: {0:s} – Message: {1:[msg_width]s} – Sum: Right justified {2:-[sum_width]d}
  • 39. Class For Flexible Indentation
  • 43. SimpleCSVReporter ● Open a CSV File ● Create – Set of possible values – Set of possible tuples – Counter indexed by each value & tuple ● Use IndentMessages to format output lines
  • 48. Using recursion for limitless indentation
  • 54. A bigger CSV file "CCN","REPORTDATETIME","SHIFT","OFFENSE","METHOD","BLOCKSIT EADDRESS","WARD","ANC","DISTRICT","PSA","NEIGHBORHOODCL USTER","BUSINESSIMPROVEMENTDISTRICT","VOTING_PRECINCT", "START_DATE","END_DATE" 4104147,"4/16/2013 12:00:00 AM","MIDNIGHT","HOMICIDE","KNIFE","1500 - 1599 BLOCK OF 1ST STREET SW",6,"6D","FIRST",105,9,,"Precinct 127","7/27/2004 8:30:00 PM","7/27/2004 8:30:00 PM" 5047867,"6/5/2013 12:00:00 AM","MIDNIGHT","SEX ABUSE","KNIFE","6500 - 6599 BLOCK OF PINEY BRANCH ROAD NW",4,"4B","FOURTH",402,17,,"Precinct 59","4/15/2005 12:30:00 PM", ● From http://data.octo.dc.gov/
  • 55. Deleted all but 4 columns "SHIFT","OFFENSE","METHOD","DISTRICT" "MIDNIGHT","HOMICIDE","KNIFE","FIRST" "MIDNIGHT","SEX ABUSE","KNIFE","FOURTH" ... "DAY","THEFT/OTHER","OTHERS","SECOND" "MIDNIGHT","SEX ABUSE","OTHERS","THIRD" "MIDNIGHT","SEX ABUSE","OTHERS","THIRD" "EVENING","BURGLARY","OTHERS","FIFTH" ...
  • 56. Method to run crime report
  • 59. Improvements ● Allow user-specified order for values, e.g. FIRST, SECOND, THIRD ● Other means of tabulating ● Keeping track of blank values ● Summing counts in columns ● ...
  • 61. Links This talk: http://www.slideshare.net/pargery/mnh-csv-python ● https://github.com/pargery/csv_utils2 ● Also some notes in http://margerytech.blogspot.com/ Info on Data Structures ● http://rhodesmill.org/brandon/slides/2014-04-pycon/data-structures/ ● http://nedbatchelder.com/text/iter.html DC crime stats ● http://data.octo.dc.gov/ “The data made available here has been modified for use from its original source, which is the Government of the District of Columbia. Neither the District of Columbia Government nor the Office of the Chief Technology Officer (OCTO) makes any claims as to the completeness, accuracy or content of any data contained in this application; makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a particular use; nor are any such warranties to be implied or inferred with respect to the information or data furnished herein. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the web feed is being used at one's own risk."