SlideShare a Scribd company logo
Building flexible tools to store
sums and report on CSV data
Presented by
Margery Harrison
Audience level: Novice
09:45 AM - 10:45 AM
August 17, 2014
Room 704
Python Flexibility
● Basic, Fortran, C, Pascal, Javascript,...
● At some point, there's a tendency to think
the same way, and just translate it
● You can write Python as if it were C
● Or you can take advantage of Python's
special data structures.
● The second option is a lot more fun.
Using Python data structures to
report on CSV data
● Lists
● Sets
● Tuples
● Dictionaries
● CSV Reader
● DictReader
● Counter
Also,
● Using tuples as dictionary keys
● Using enumerate() to count how many
times you've looped
– See “Loop like a Native”
http://nedbatchelder.com/text/iter.html
Code Development Method
● Start with simplest possible version
● Test and validate
● Iterative improvements
– Make it prettier
– Make it do more
– Make it more general
This is a CSV file
color,size,shape,number
red,big,square,3
blue,big,triangle,5
green,small,square,2
blue,small,triangle,1
red,big,square,7
blue,small,triangle,3
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
http://i239.photobucket.com/albums/ff263/peacelovebones/two-
pandas-rolling-1.jpg
CSV DictReader
>>> import csv
>>> import os
>>> with open("simpleCSV.txt") as f:
... r=csv.DictReader(f)
... for row in r:
... print row
...
Running DictReader
DictReader is sequential
Tabulate All Possible Values
How many of each?
● It's nice to have a listing that shows the
variety of objects that can appear in each
column.
● Next, we'd like to count how many of each
● And guess what? Python has a special data
structure for that.
collections.Counter
Playing with Counters
Index into Counters
Counter + DictReader
Let's use counters to tell us how many of each
value was in each column.
Print number of each value
Output
color
blue : 3
green : 1
red : 2
shape
square : 3
triangle: 3
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 3
big : 3
You might ask, why not this?
for row in r:
for head in r.fieldnames:
field_value = row[head]
possible_values[head].add(field_value)
#count_of_values[field_value]+=1
count_of_values.update(field_value)
print count_of_values
Because, Counter likes to count
Counter({'e': 13, 'l': 12, 'a': 9, 'r': 9, 'g': 7, 'b': 6, 'i': 6, 's':
6, 'u': 6, 'n': 4, 'm': 3, 'q': 3, 't': 3, 'd': 2, '3': 2, '1': 1, '2':
1, '7': 1, '5': 1})
color
blue : 0
green : 0
red : 0
shape
square : 0
triangle: 0
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 0
big : 0
Output
color
blue : 3
green : 1
red : 2
shape
square : 3
triangle: 3
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 3
big : 3
How many red squares?
● We can use tuples as an index into the
counter
– (red,square)
– (big,red,square)
– (small,blue,triangle)
– (small,square)
Let's use a simpler CSV
color,size,shape
red,big,square
blue,big,triangle
green,small,square
blue,small,triangle
red,big,square
blue,small,triangle
Counting Tuples
trying to use magic update()
>>> c=collections.Counter([('a,b'),('c,d,e')])
>>> c
Counter({'a,b': 1, 'c,d,e': 1})
>>> c.update(('a','b'))
>>> c
Counter({'a': 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
>>> c.update((('a','b'),))
>>> c
Counter({'a': 1, ('a', 'b'): 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
Oh well
>>> c.update([(('a','b'),)])
>>> c
Counter({'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1,
('a', 'b'): 1})
>>> c[('a','b')]
1
>>> c[('a','b')]+=5
>>> c
Counter({('a', 'b'): 6, 'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e':
1, 'a,b': 1})
Combo Count Part 1: Initialize
Combo Count 2: Counting
Combo Count 3: Printing
Combo Count Output
color
blue : 3
3 blue in 1 combinations:
('blue', 'big'): 1
('blue', 'small'): 2
3 blue in 2 combinations:
('blue', 'big', 'triangle'): 1
('blue', 'small', 'triangle'): 2
green : 1
1 green in 1 combinations:
('green', 'small'): 1
1 green in 2 combinations:
('green', 'small', 'square'): 1
red : 2
2 red in 1 combinations:
('red', 'big'): 2
2 red in 2 combinations:
('red', 'big', 'square'): 2
shape
square : 3
3 square in 1 combinations:
3 square in 2 combinations:
('red', 'big', 'square'): 2
('green', 'small', 'square'): 1
triangle: 3
3 triangle in 1 combinations:
3 triangle in 2 combinations:
('blue', 'big', 'triangle'): 1
('blue', 'small', 'triangle'): 2
size
small : 3
3 small in 1 combinations:
('blue', 'small'): 2
('green', 'small'): 1
3 small in 2 combinations:
('green', 'small', 'square'): 1
('blue', 'small', 'triangle'): 2
big : 3
3 big in 1 combinations:
('blue', 'big'): 1
('red', 'big'): 2
3 big in 2 combinations:
('red', 'big', 'square'): 2
('blue', 'big', 'triangle'):
1
Well, that's ugly
● We need to make it prettier
● We need to write out to a file
● We need to break things up into Classes
Printing Combination Levels
Number of Squares
Number of Red Squares
Number of Blue Squares
Number of Triangles
Number of Red Triangles
Number of Blue Triangles
Total Red
Total Blue
Indentation per level
● If we're indexing by tuple, then the
indentation level could correspond to the
number of items in the tuple.
● Let's have general methods to format the
indentation level, given the number of
items in the tuple, or input 'level' integer
A class write_indent() method
If part of class with counter and msgs dict,
just pass in the tuple:
def write_indent(self, tup_index):
'''
:param tup_index: tuple index into counter
'''
indent = ' ' * len(tup_index)
msg = self.msgs[tup_index]
sum = self.counts[tup_index]
indented_msg = ('{0:s}{1:s}'.format(
indent, msg, sum)
class-less indent_message()
def indent_message(level, msg, sum,
space_per_indent=2, space=' '):
num_spaces = self.space_per_indent * level
indent = space * num_spaces
# We'll want to tune the formatting..
indented_msg = ('{0:s}{1:s}:{2:d}'.format(
indent, msg, sum)
return indented_msg
Adjustable field widths
Depending on data, we'll want different
field widths
red squares 5
Blue squares 21
Large Red Squares in the Bronx 987654321
Using format to format a format
string
>>> f='{{0:{0:d}s}}'.format(3)
>>> f
'{0:3s}'
>>> f='{{0:{0:d}s}}{{1:{1:d}d}}'.format(3,5)
>>> f
'{0:3s}{1:5d}'
>>> f='{{0:s}}{{1:{0:d}s}}{{2:{1:d}d}}'.format(3,5)
>>> f
'{0:s}{1:3s}{2:5d}'
Format 3 values
● Our formatting string will print 3 values:
– String of space chars: {0:s}
– Message: {1:[msg_width]s}
– Sum: Right justified {2:-[sum_width]d}
Class For Flexible Indentation
Flexible Indent Class Variables
Flexible Indent Method
Testing IndentMessages class
SimpleCSVReporter
● Open a CSV File
● Create
– Set of possible values
– Set of possible tuples
– Counter indexed by each value & tuple
● Use IndentMessages to format output lines
SimpleCSVReporter class vars
readCSV() begins
initialize sets..
readCSV() continued:
Loop to collect & sum
Write to Report File
Using recursion for limitless
indentation
Recursive print sub-levels
Word transform stubs
General method to test
Test with simpler CSV
Output for simpler CSV
A bigger CSV file
"CCN","REPORTDATETIME","SHIFT","OFFENSE","METHOD","BLOCKSIT
EADDRESS","WARD","ANC","DISTRICT","PSA","NEIGHBORHOODCL
USTER","BUSINESSIMPROVEMENTDISTRICT","VOTING_PRECINCT",
"START_DATE","END_DATE"
4104147,"4/16/2013 12:00:00
AM","MIDNIGHT","HOMICIDE","KNIFE","1500 - 1599 BLOCK OF 1ST
STREET SW",6,"6D","FIRST",105,9,,"Precinct 127","7/27/2004 8:30:00
PM","7/27/2004 8:30:00 PM"
5047867,"6/5/2013 12:00:00 AM","MIDNIGHT","SEX ABUSE","KNIFE","6500
- 6599 BLOCK OF PINEY BRANCH ROAD
NW",4,"4B","FOURTH",402,17,,"Precinct 59","4/15/2005 12:30:00 PM",
● From http://data.octo.dc.gov/
Deleted all but 4 columns
"SHIFT","OFFENSE","METHOD","DISTRICT"
"MIDNIGHT","HOMICIDE","KNIFE","FIRST"
"MIDNIGHT","SEX ABUSE","KNIFE","FOURTH"
...
"DAY","THEFT/OTHER","OTHERS","SECOND"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"EVENING","BURGLARY","OTHERS","FIFTH"
...
Method to run crime report
Output - top
Output - bottom
Improvements
● Allow user-specified order for values, e.g.
FIRST, SECOND, THIRD
● Other means of tabulating
● Keeping track of blank values
● Summing counts in columns
● ...
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
Links
This talk: http://www.slideshare.net/pargery/mnh-csv-python
● https://github.com/pargery/csv_utils2
● Also some notes in http://margerytech.blogspot.com/
Info on Data Structures
● http://rhodesmill.org/brandon/slides/2014-04-pycon/data-structures/
● http://nedbatchelder.com/text/iter.html
DC crime stats
● http://data.octo.dc.gov/
“The data made available here has been modified for use from its original source, which is the Government of the
District of Columbia. Neither the District of Columbia Government nor the Office of the Chief Technology Officer
(OCTO) makes any claims as to the completeness, accuracy or content of any data contained in this application;
makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a
particular use; nor are any such warranties to be implied or inferred with respect to the information or data
furnished herein. The data is subject to change as modifications and updates are complete. It is understood that
the information contained in the web feed is being used at one's own risk."

More Related Content

Similar to Mnh csv python

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
Yanchang Zhao
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
Peadar Coyle
 
The R of War
The R of WarThe R of War
The R of War
Kevin Davis
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
062MayankSinghal
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekendsDe-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
DSCUSICT
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
Yanchang Zhao
 
Zurich R User group: Desc tools
Zurich R User group: Desc tools Zurich R User group: Desc tools
Zurich R User group: Desc tools
Zurich_R_User_Group
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docx
gerardkortney
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
OllieShoresna
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
Nima Sarshar
 
Abir ppt3
Abir ppt3Abir ppt3
Abir ppt3
abir96
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model Checking
Akos Hajdu
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
Alex Powers
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
DevikaRaj14
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
DevikaRaj14
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
SudhanshiBakre1
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
James Nelson
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
Dr.ammara khakwani
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
Feature-Engineering-Earth-Advocacy-Project-2015
Feature-Engineering-Earth-Advocacy-Project-2015Feature-Engineering-Earth-Advocacy-Project-2015
Feature-Engineering-Earth-Advocacy-Project-2015Ankoor Bhagat
 

Similar to Mnh csv python (20)

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
The R of War
The R of WarThe R of War
The R of War
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekendsDe-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
Zurich R User group: Desc tools
Zurich R User group: Desc tools Zurich R User group: Desc tools
Zurich R User group: Desc tools
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docx
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
 
Abir ppt3
Abir ppt3Abir ppt3
Abir ppt3
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model Checking
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
Feature-Engineering-Earth-Advocacy-Project-2015
Feature-Engineering-Earth-Advocacy-Project-2015Feature-Engineering-Earth-Advocacy-Project-2015
Feature-Engineering-Earth-Advocacy-Project-2015
 

Recently uploaded

Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 

Recently uploaded (20)

Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 

Mnh csv python

  • 1. Building flexible tools to store sums and report on CSV data Presented by Margery Harrison Audience level: Novice 09:45 AM - 10:45 AM August 17, 2014 Room 704
  • 2. Python Flexibility ● Basic, Fortran, C, Pascal, Javascript,... ● At some point, there's a tendency to think the same way, and just translate it ● You can write Python as if it were C ● Or you can take advantage of Python's special data structures. ● The second option is a lot more fun.
  • 3. Using Python data structures to report on CSV data ● Lists ● Sets ● Tuples ● Dictionaries ● CSV Reader ● DictReader ● Counter
  • 4. Also, ● Using tuples as dictionary keys ● Using enumerate() to count how many times you've looped – See “Loop like a Native” http://nedbatchelder.com/text/iter.html
  • 5. Code Development Method ● Start with simplest possible version ● Test and validate ● Iterative improvements – Make it prettier – Make it do more – Make it more general
  • 6. This is a CSV file color,size,shape,number red,big,square,3 blue,big,triangle,5 green,small,square,2 blue,small,triangle,1 red,big,square,7 blue,small,triangle,3
  • 9. CSV DictReader >>> import csv >>> import os >>> with open("simpleCSV.txt") as f: ... r=csv.DictReader(f) ... for row in r: ... print row ...
  • 13. How many of each? ● It's nice to have a listing that shows the variety of objects that can appear in each column. ● Next, we'd like to count how many of each ● And guess what? Python has a special data structure for that.
  • 17. Counter + DictReader Let's use counters to tell us how many of each value was in each column.
  • 18. Print number of each value
  • 19. Output color blue : 3 green : 1 red : 2 shape square : 3 triangle: 3 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 3 big : 3
  • 20. You might ask, why not this? for row in r: for head in r.fieldnames: field_value = row[head] possible_values[head].add(field_value) #count_of_values[field_value]+=1 count_of_values.update(field_value) print count_of_values
  • 21. Because, Counter likes to count Counter({'e': 13, 'l': 12, 'a': 9, 'r': 9, 'g': 7, 'b': 6, 'i': 6, 's': 6, 'u': 6, 'n': 4, 'm': 3, 'q': 3, 't': 3, 'd': 2, '3': 2, '1': 1, '2': 1, '7': 1, '5': 1}) color blue : 0 green : 0 red : 0 shape square : 0 triangle: 0 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 0 big : 0
  • 22. Output color blue : 3 green : 1 red : 2 shape square : 3 triangle: 3 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 3 big : 3
  • 23. How many red squares? ● We can use tuples as an index into the counter – (red,square) – (big,red,square) – (small,blue,triangle) – (small,square)
  • 24. Let's use a simpler CSV color,size,shape red,big,square blue,big,triangle green,small,square blue,small,triangle red,big,square blue,small,triangle
  • 25. Counting Tuples trying to use magic update() >>> c=collections.Counter([('a,b'),('c,d,e')]) >>> c Counter({'a,b': 1, 'c,d,e': 1}) >>> c.update(('a','b')) >>> c Counter({'a': 1, 'b': 1, 'a,b': 1, 'c,d,e': 1}) >>> c.update((('a','b'),)) >>> c Counter({'a': 1, ('a', 'b'): 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
  • 26. Oh well >>> c.update([(('a','b'),)]) >>> c Counter({'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1, ('a', 'b'): 1}) >>> c[('a','b')] 1 >>> c[('a','b')]+=5 >>> c Counter({('a', 'b'): 6, 'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1})
  • 27. Combo Count Part 1: Initialize
  • 28. Combo Count 2: Counting
  • 29. Combo Count 3: Printing
  • 30. Combo Count Output color blue : 3 3 blue in 1 combinations: ('blue', 'big'): 1 ('blue', 'small'): 2 3 blue in 2 combinations: ('blue', 'big', 'triangle'): 1 ('blue', 'small', 'triangle'): 2 green : 1 1 green in 1 combinations: ('green', 'small'): 1 1 green in 2 combinations: ('green', 'small', 'square'): 1 red : 2 2 red in 1 combinations: ('red', 'big'): 2 2 red in 2 combinations: ('red', 'big', 'square'): 2 shape square : 3 3 square in 1 combinations: 3 square in 2 combinations: ('red', 'big', 'square'): 2 ('green', 'small', 'square'): 1 triangle: 3 3 triangle in 1 combinations: 3 triangle in 2 combinations: ('blue', 'big', 'triangle'): 1 ('blue', 'small', 'triangle'): 2 size small : 3 3 small in 1 combinations: ('blue', 'small'): 2 ('green', 'small'): 1 3 small in 2 combinations: ('green', 'small', 'square'): 1 ('blue', 'small', 'triangle'): 2 big : 3 3 big in 1 combinations: ('blue', 'big'): 1 ('red', 'big'): 2 3 big in 2 combinations: ('red', 'big', 'square'): 2 ('blue', 'big', 'triangle'): 1
  • 31. Well, that's ugly ● We need to make it prettier ● We need to write out to a file ● We need to break things up into Classes
  • 32. Printing Combination Levels Number of Squares Number of Red Squares Number of Blue Squares Number of Triangles Number of Red Triangles Number of Blue Triangles Total Red Total Blue
  • 33. Indentation per level ● If we're indexing by tuple, then the indentation level could correspond to the number of items in the tuple. ● Let's have general methods to format the indentation level, given the number of items in the tuple, or input 'level' integer
  • 34. A class write_indent() method If part of class with counter and msgs dict, just pass in the tuple: def write_indent(self, tup_index): ''' :param tup_index: tuple index into counter ''' indent = ' ' * len(tup_index) msg = self.msgs[tup_index] sum = self.counts[tup_index] indented_msg = ('{0:s}{1:s}'.format( indent, msg, sum)
  • 35. class-less indent_message() def indent_message(level, msg, sum, space_per_indent=2, space=' '): num_spaces = self.space_per_indent * level indent = space * num_spaces # We'll want to tune the formatting.. indented_msg = ('{0:s}{1:s}:{2:d}'.format( indent, msg, sum) return indented_msg
  • 36. Adjustable field widths Depending on data, we'll want different field widths red squares 5 Blue squares 21 Large Red Squares in the Bronx 987654321
  • 37. Using format to format a format string >>> f='{{0:{0:d}s}}'.format(3) >>> f '{0:3s}' >>> f='{{0:{0:d}s}}{{1:{1:d}d}}'.format(3,5) >>> f '{0:3s}{1:5d}' >>> f='{{0:s}}{{1:{0:d}s}}{{2:{1:d}d}}'.format(3,5) >>> f '{0:s}{1:3s}{2:5d}'
  • 38. Format 3 values ● Our formatting string will print 3 values: – String of space chars: {0:s} – Message: {1:[msg_width]s} – Sum: Right justified {2:-[sum_width]d}
  • 39. Class For Flexible Indentation
  • 43. SimpleCSVReporter ● Open a CSV File ● Create – Set of possible values – Set of possible tuples – Counter indexed by each value & tuple ● Use IndentMessages to format output lines
  • 48. Using recursion for limitless indentation
  • 54. A bigger CSV file "CCN","REPORTDATETIME","SHIFT","OFFENSE","METHOD","BLOCKSIT EADDRESS","WARD","ANC","DISTRICT","PSA","NEIGHBORHOODCL USTER","BUSINESSIMPROVEMENTDISTRICT","VOTING_PRECINCT", "START_DATE","END_DATE" 4104147,"4/16/2013 12:00:00 AM","MIDNIGHT","HOMICIDE","KNIFE","1500 - 1599 BLOCK OF 1ST STREET SW",6,"6D","FIRST",105,9,,"Precinct 127","7/27/2004 8:30:00 PM","7/27/2004 8:30:00 PM" 5047867,"6/5/2013 12:00:00 AM","MIDNIGHT","SEX ABUSE","KNIFE","6500 - 6599 BLOCK OF PINEY BRANCH ROAD NW",4,"4B","FOURTH",402,17,,"Precinct 59","4/15/2005 12:30:00 PM", ● From http://data.octo.dc.gov/
  • 55. Deleted all but 4 columns "SHIFT","OFFENSE","METHOD","DISTRICT" "MIDNIGHT","HOMICIDE","KNIFE","FIRST" "MIDNIGHT","SEX ABUSE","KNIFE","FOURTH" ... "DAY","THEFT/OTHER","OTHERS","SECOND" "MIDNIGHT","SEX ABUSE","OTHERS","THIRD" "MIDNIGHT","SEX ABUSE","OTHERS","THIRD" "EVENING","BURGLARY","OTHERS","FIFTH" ...
  • 56. Method to run crime report
  • 59. Improvements ● Allow user-specified order for values, e.g. FIRST, SECOND, THIRD ● Other means of tabulating ● Keeping track of blank values ● Summing counts in columns ● ...
  • 61. Links This talk: http://www.slideshare.net/pargery/mnh-csv-python ● https://github.com/pargery/csv_utils2 ● Also some notes in http://margerytech.blogspot.com/ Info on Data Structures ● http://rhodesmill.org/brandon/slides/2014-04-pycon/data-structures/ ● http://nedbatchelder.com/text/iter.html DC crime stats ● http://data.octo.dc.gov/ “The data made available here has been modified for use from its original source, which is the Government of the District of Columbia. Neither the District of Columbia Government nor the Office of the Chief Technology Officer (OCTO) makes any claims as to the completeness, accuracy or content of any data contained in this application; makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a particular use; nor are any such warranties to be implied or inferred with respect to the information or data furnished herein. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the web feed is being used at one's own risk."