SlideShare a Scribd company logo
Building flexible tools to store
sums and report on CSV data
Presented by
Margery Harrison
Audience level: Novice
09:45 AM - 10:45 AM
August 17, 2014
Room 704
Python Flexibility
● Basic, Fortran, C, Pascal, Javascript,...
● At some point, there's a tendency to think
the same way, and just translate it
● You can write Python as if it were C
● Or you can take advantage of Python's
special data structures.
● The second option is a lot more fun.
Using Python data structures to
report on CSV data
● Lists
● Sets
● Tuples
● Dictionaries
● CSV Reader
● DictReader
● Counter
Also,
● Using tuples as dictionary keys
● Using enumerate() to count how many
times you've looped
– See “Loop like a Native”
http://nedbatchelder.com/text/iter.html
Code Development Method
● Start with simplest possible version
● Test and validate
● Iterative improvements
– Make it prettier
– Make it do more
– Make it more general
This is a CSV file
color,size,shape,number
red,big,square,3
blue,big,triangle,5
green,small,square,2
blue,small,triangle,1
red,big,square,7
blue,small,triangle,3
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
http://i239.photobucket.com/albums/ff263/peacelovebones/two-
pandas-rolling-1.jpg
CSV DictReader
>>> import csv
>>> import os
>>> with open("simpleCSV.txt") as f:
... r=csv.DictReader(f)
... for row in r:
... print row
...
Running DictReader
DictReader is sequential
Tabulate All Possible Values
How many of each?
● It's nice to have a listing that shows the
variety of objects that can appear in each
column.
● Next, we'd like to count how many of each
● And guess what? Python has a special data
structure for that.
collections.Counter
Playing with Counters
Index into Counters
Counter + DictReader
Let's use counters to tell us how many of each
value was in each column.
Print number of each value
Output
color
blue : 3
green : 1
red : 2
shape
square : 3
triangle: 3
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 3
big : 3
You might ask, why not this?
for row in r:
for head in r.fieldnames:
field_value = row[head]
possible_values[head].add(field_value)
#count_of_values.update(row[head])
count_of_values.update(field_value)
print count_of_values
Because
Counter({'e': 13, 'l': 12, 'a': 9, 'r': 9, 'g': 7, 'b': 6, 'i': 6, 's':
6, 'u': 6, 'n': 4, 'm': 3, 'q': 3, 't': 3, 'd': 2, '3': 2, '1': 1, '2':
1, '7': 1, '5': 1})
color
blue : 0
green : 0
red : 0
shape
square : 0
triangle: 0
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 0
big : 0
Output
color
blue : 3
green : 1
red : 2
shape
square : 3
triangle: 3
number
1 : 1
3 : 2
2 : 1
5 : 1
7 : 1
size
small : 3
big : 3
How many red squares?
● We can use tuples as an index into the
counter
– (red,square)
– (big,red,square)
– (small,blue,triangle)
– (small,square)
Let's use a simpler CSV
color,size,shape
red,big,square
blue,big,triangle
green,small,square
blue,small,triangle
red,big,square
blue,small,triangle
Counting Tuples
trying to use magic update()
>>> c=collections.Counter([('a,b'),('c,d,e')])
>>> c
Counter({'a,b': 1, 'c,d,e': 1})
>>> c.update(('a','b'))
>>> c
Counter({'a': 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
>>> c.update((('a','b'),))
>>> c
Counter({'a': 1, ('a', 'b'): 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
Oh well
>>> c.update([(('a','b'),)])
>>> c
Counter({'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1,
('a', 'b'): 1})
>>> c[('a','b')]
1
>>> c[('a','b')]+=5
>>> c
Counter({('a', 'b'): 6, 'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e':
1, 'a,b': 1})
Combo Count Part 1: Initialize
Combo Count 2: Counting
Combo Count 3: Printing
Combo Count Output
color
blue : 3
3 blue in 1 combinations:
('blue', 'big'): 1
('blue', 'small'): 2
3 blue in 2 combinations:
('blue', 'big', 'triangle'): 1
('blue', 'small', 'triangle'): 2
green : 1
1 green in 1 combinations:
('green', 'small'): 1
1 green in 2 combinations:
('green', 'small', 'square'): 1
red : 2
2 red in 1 combinations:
('red', 'big'): 2
2 red in 2 combinations:
('red', 'big', 'square'): 2
shape
square : 3
3 square in 1 combinations:
3 square in 2 combinations:
('red', 'big', 'square'): 2
('green', 'small', 'square'): 1
triangle: 3
3 triangle in 1 combinations:
3 triangle in 2 combinations:
('blue', 'big', 'triangle'): 1
('blue', 'small', 'triangle'): 2
size
small : 3
3 small in 1 combinations:
('blue', 'small'): 2
('green', 'small'): 1
3 small in 2 combinations:
('green', 'small', 'square'): 1
('blue', 'small', 'triangle'): 2
big : 3
3 big in 1 combinations:
('blue', 'big'): 1
('red', 'big'): 2
3 big in 2 combinations:
('red', 'big', 'square'): 2
('blue', 'big', 'triangle'):
1
Well, that's ugly
● We need to make it prettier
● We need to write out to a file
● We need to break things up into Classes
Printing Combination Levels
Number of Squares
Number of Red Squares
Number of Blue Squares
Number of Triangles
Number of Red Triangles
Number of Blue Triangles
Total Red
Total Blue
Indentation per level
● If we're indexing by tuple, then the
indentation level could correspond to the
number of items in the tuple.
● Let's have general methods to format the
indentation level, given the number of
items in the tuple, or input 'level' integer
A class write_indent() method
If part of class with counter and msgs dict,
just pass in the tuple:
def write_indent(self, tup_index):
'''
:param tup_index: tuple index into counter
'''
indent = ' ' * len(tup_index)
msg = self.msgs[tup_index]
sum = self.counts[tup_index]
indented_msg = ('{0:s}{1:s}'.format(
indent, msg, sum)
class-less indent_message()
def indent_message(level, msg, sum,
space_per_indent=2, space=' '):
num_spaces = self.space_per_indent * level
indent = space * num_spaces
# We'll want to tune the formatting..
indented_msg = ('{0:s}{1:s}:{2:d}'.format(
indent, msg, sum)
return indented_msg
Adjustable field widths
Depending on data, we'll want different
field widths
red squares 5
Blue squares 21
Large Red Squares in the Bronx 987654321
Using format to format a format
string
>>> f='{{0:{0:d}s}}'.format(3)
>>> f
'{0:3s}'
>>> f='{{0:{0:d}s}}{{1:{1:d}d}}'.format(3,5)
>>> f
'{0:3s}{1:5d}'
>>> f='{{0:s}}{{1:{0:d}s}}{{2:{1:d}d}}'.format(3,5)
>>> f
'{0:s}{1:3s}{2:5d}'
Format 3 values
● Our formatting string will print 3 values:
– String of space chars: {0:s}
– Message: {1:[msg_width]s}
– Sum: Right justified {2:-[sum_width]d}
Class For Flexible Indentation
Flexible Indent Class Variables
Flexible Indent Method
Testing IndentMessages class
SimpleCSVReporter
● Open a CSV File
● Create
– Set of possible values
– Set of possible tuples
– Counter indexed by each value & tuple
● Use IndentMessages to format output lines
SimpleCSVReporter class vars
readCSV() begins
initialize sets..
readCSV() continued:
Loop to collect & sum
Write to Report File
Using recursion for limitless
indentation
Recursive print sub-levels
Word transform stubs
General method to test
Test with simpler CSV
Output for simpler CSV
A bigger CSV file
"CCN","REPORTDATETIME","SHIFT","OFFENSE","METHOD","BLOCKSIT
EADDRESS","WARD","ANC","DISTRICT","PSA","NEIGHBORHOODCL
USTER","BUSINESSIMPROVEMENTDISTRICT","VOTING_PRECINCT",
"START_DATE","END_DATE"
4104147,"4/16/2013 12:00:00
AM","MIDNIGHT","HOMICIDE","KNIFE","1500 - 1599 BLOCK OF 1ST
STREET SW",6,"6D","FIRST",105,9,,"Precinct 127","7/27/2004 8:30:00
PM","7/27/2004 8:30:00 PM"
5047867,"6/5/2013 12:00:00 AM","MIDNIGHT","SEX ABUSE","KNIFE","6500
- 6599 BLOCK OF PINEY BRANCH ROAD
NW",4,"4B","FOURTH",402,17,,"Precinct 59","4/15/2005 12:30:00 PM",
● From http://data.octo.dc.gov/
Deleted all but 4 columns
"SHIFT","OFFENSE","METHOD","DISTRICT"
"MIDNIGHT","HOMICIDE","KNIFE","FIRST"
"MIDNIGHT","SEX ABUSE","KNIFE","FOURTH"
...
"DAY","THEFT/OTHER","OTHERS","SECOND"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"MIDNIGHT","SEX ABUSE","OTHERS","THIRD"
"EVENING","BURGLARY","OTHERS","FIFTH"
...
Method to run crime report
Output - top
Output - bottom
Improvements
● Allow user-specified order for values, e.g.
FIRST, SECOND, THIRD
● Other means of tabulating
● Keeping track of blank values
● Summing counts in columns
● ...
https://c1.staticflickr.com/3/2201/2469586703_cfdaf88195.jpg
Links
This talk: http://www.slideshare.net/pargery/mnh-csv-python
● https://github.com/pargery/csv_utils2
● Also some notes in http://margerytech.blogspot.com/
Info on Data Structures
● http://rhodesmill.org/brandon/slides/2014-04-pycon/data-structures/
● http://nedbatchelder.com/text/iter.html
DC crime stats
● http://data.octo.dc.gov/
“The data made available here has been modified for use from its original source, which is the Government of the
District of Columbia. Neither the District of Columbia Government nor the Office of the Chief Technology Officer
(OCTO) makes any claims as to the completeness, accuracy or content of any data contained in this application;
makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a
particular use; nor are any such warranties to be implied or inferred with respect to the information or data
furnished herein. The data is subject to change as modifications and updates are complete. It is understood that
the information contained in the web feed is being used at one's own risk."

More Related Content

Similar to Mnh csv python

It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
Alex Powers
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
OllieShoresna
 
“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”
diannepatricia
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekendsDe-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
DSCUSICT
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
Nima Sarshar
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
062MayankSinghal
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
Dr.ammara khakwani
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docx
gerardkortney
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
James Nelson
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
DevikaRaj14
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
DevikaRaj14
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptx
JocelynBadua2
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptx
JocelynBadua2
 
Interpolation Missing values.pptx
Interpolation Missing values.pptxInterpolation Missing values.pptx
Interpolation Missing values.pptx
RushikeshGore18
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
Abhirup Mallik
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
SudhanshiBakre1
 
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)Yogi Sharo
 
Assignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docxAssignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docx
fredharris32
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
Victoria Malaya
 

Similar to Mnh csv python (20)

It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
 
Data Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with NData Manipulation with Numpy and Pandas in PythonStarting with N
Data Manipulation with Numpy and Pandas in PythonStarting with N
 
“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”“Probabilistic Logic Programs and Their Applications”
“Probabilistic Logic Programs and Their Applications”
 
De-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekendsDe-Cluttering-ML | TechWeekends
De-Cluttering-ML | TechWeekends
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docx
 
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
IU Applied Machine Learning Class Final Project: ML Methods for Predicting Wi...
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
 
r studio presentation.pptx
r studio presentation.pptxr studio presentation.pptx
r studio presentation.pptx
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptx
 
CSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptxCSS FINAL OBSERVATION.pptx
CSS FINAL OBSERVATION.pptx
 
Interpolation Missing values.pptx
Interpolation Missing values.pptxInterpolation Missing values.pptx
Interpolation Missing values.pptx
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
 
Assignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docxAssignment #9First, we recall some definitions that will be help.docx
Assignment #9First, we recall some definitions that will be help.docx
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 

Mnh csv python

  • 1. Building flexible tools to store sums and report on CSV data Presented by Margery Harrison Audience level: Novice 09:45 AM - 10:45 AM August 17, 2014 Room 704
  • 2. Python Flexibility ● Basic, Fortran, C, Pascal, Javascript,... ● At some point, there's a tendency to think the same way, and just translate it ● You can write Python as if it were C ● Or you can take advantage of Python's special data structures. ● The second option is a lot more fun.
  • 3. Using Python data structures to report on CSV data ● Lists ● Sets ● Tuples ● Dictionaries ● CSV Reader ● DictReader ● Counter
  • 4. Also, ● Using tuples as dictionary keys ● Using enumerate() to count how many times you've looped – See “Loop like a Native” http://nedbatchelder.com/text/iter.html
  • 5. Code Development Method ● Start with simplest possible version ● Test and validate ● Iterative improvements – Make it prettier – Make it do more – Make it more general
  • 6. This is a CSV file color,size,shape,number red,big,square,3 blue,big,triangle,5 green,small,square,2 blue,small,triangle,1 red,big,square,7 blue,small,triangle,3
  • 9. CSV DictReader >>> import csv >>> import os >>> with open("simpleCSV.txt") as f: ... r=csv.DictReader(f) ... for row in r: ... print row ...
  • 13. How many of each? ● It's nice to have a listing that shows the variety of objects that can appear in each column. ● Next, we'd like to count how many of each ● And guess what? Python has a special data structure for that.
  • 17. Counter + DictReader Let's use counters to tell us how many of each value was in each column.
  • 18. Print number of each value
  • 19. Output color blue : 3 green : 1 red : 2 shape square : 3 triangle: 3 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 3 big : 3
  • 20. You might ask, why not this? for row in r: for head in r.fieldnames: field_value = row[head] possible_values[head].add(field_value) #count_of_values.update(row[head]) count_of_values.update(field_value) print count_of_values
  • 21. Because Counter({'e': 13, 'l': 12, 'a': 9, 'r': 9, 'g': 7, 'b': 6, 'i': 6, 's': 6, 'u': 6, 'n': 4, 'm': 3, 'q': 3, 't': 3, 'd': 2, '3': 2, '1': 1, '2': 1, '7': 1, '5': 1}) color blue : 0 green : 0 red : 0 shape square : 0 triangle: 0 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 0 big : 0
  • 22. Output color blue : 3 green : 1 red : 2 shape square : 3 triangle: 3 number 1 : 1 3 : 2 2 : 1 5 : 1 7 : 1 size small : 3 big : 3
  • 23. How many red squares? ● We can use tuples as an index into the counter – (red,square) – (big,red,square) – (small,blue,triangle) – (small,square)
  • 24. Let's use a simpler CSV color,size,shape red,big,square blue,big,triangle green,small,square blue,small,triangle red,big,square blue,small,triangle
  • 25. Counting Tuples trying to use magic update() >>> c=collections.Counter([('a,b'),('c,d,e')]) >>> c Counter({'a,b': 1, 'c,d,e': 1}) >>> c.update(('a','b')) >>> c Counter({'a': 1, 'b': 1, 'a,b': 1, 'c,d,e': 1}) >>> c.update((('a','b'),)) >>> c Counter({'a': 1, ('a', 'b'): 1, 'b': 1, 'a,b': 1, 'c,d,e': 1})
  • 26. Oh well >>> c.update([(('a','b'),)]) >>> c Counter({'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1, ('a', 'b'): 1}) >>> c[('a','b')] 1 >>> c[('a','b')]+=5 >>> c Counter({('a', 'b'): 6, 'a': 2, 'b': 2, (('a', 'b'),): 1, 'c,d,e': 1, 'a,b': 1})
  • 27. Combo Count Part 1: Initialize
  • 28. Combo Count 2: Counting
  • 29. Combo Count 3: Printing
  • 30. Combo Count Output color blue : 3 3 blue in 1 combinations: ('blue', 'big'): 1 ('blue', 'small'): 2 3 blue in 2 combinations: ('blue', 'big', 'triangle'): 1 ('blue', 'small', 'triangle'): 2 green : 1 1 green in 1 combinations: ('green', 'small'): 1 1 green in 2 combinations: ('green', 'small', 'square'): 1 red : 2 2 red in 1 combinations: ('red', 'big'): 2 2 red in 2 combinations: ('red', 'big', 'square'): 2 shape square : 3 3 square in 1 combinations: 3 square in 2 combinations: ('red', 'big', 'square'): 2 ('green', 'small', 'square'): 1 triangle: 3 3 triangle in 1 combinations: 3 triangle in 2 combinations: ('blue', 'big', 'triangle'): 1 ('blue', 'small', 'triangle'): 2 size small : 3 3 small in 1 combinations: ('blue', 'small'): 2 ('green', 'small'): 1 3 small in 2 combinations: ('green', 'small', 'square'): 1 ('blue', 'small', 'triangle'): 2 big : 3 3 big in 1 combinations: ('blue', 'big'): 1 ('red', 'big'): 2 3 big in 2 combinations: ('red', 'big', 'square'): 2 ('blue', 'big', 'triangle'): 1
  • 31. Well, that's ugly ● We need to make it prettier ● We need to write out to a file ● We need to break things up into Classes
  • 32. Printing Combination Levels Number of Squares Number of Red Squares Number of Blue Squares Number of Triangles Number of Red Triangles Number of Blue Triangles Total Red Total Blue
  • 33. Indentation per level ● If we're indexing by tuple, then the indentation level could correspond to the number of items in the tuple. ● Let's have general methods to format the indentation level, given the number of items in the tuple, or input 'level' integer
  • 34. A class write_indent() method If part of class with counter and msgs dict, just pass in the tuple: def write_indent(self, tup_index): ''' :param tup_index: tuple index into counter ''' indent = ' ' * len(tup_index) msg = self.msgs[tup_index] sum = self.counts[tup_index] indented_msg = ('{0:s}{1:s}'.format( indent, msg, sum)
  • 35. class-less indent_message() def indent_message(level, msg, sum, space_per_indent=2, space=' '): num_spaces = self.space_per_indent * level indent = space * num_spaces # We'll want to tune the formatting.. indented_msg = ('{0:s}{1:s}:{2:d}'.format( indent, msg, sum) return indented_msg
  • 36. Adjustable field widths Depending on data, we'll want different field widths red squares 5 Blue squares 21 Large Red Squares in the Bronx 987654321
  • 37. Using format to format a format string >>> f='{{0:{0:d}s}}'.format(3) >>> f '{0:3s}' >>> f='{{0:{0:d}s}}{{1:{1:d}d}}'.format(3,5) >>> f '{0:3s}{1:5d}' >>> f='{{0:s}}{{1:{0:d}s}}{{2:{1:d}d}}'.format(3,5) >>> f '{0:s}{1:3s}{2:5d}'
  • 38. Format 3 values ● Our formatting string will print 3 values: – String of space chars: {0:s} – Message: {1:[msg_width]s} – Sum: Right justified {2:-[sum_width]d}
  • 39. Class For Flexible Indentation
  • 43. SimpleCSVReporter ● Open a CSV File ● Create – Set of possible values – Set of possible tuples – Counter indexed by each value & tuple ● Use IndentMessages to format output lines
  • 48. Using recursion for limitless indentation
  • 54. A bigger CSV file "CCN","REPORTDATETIME","SHIFT","OFFENSE","METHOD","BLOCKSIT EADDRESS","WARD","ANC","DISTRICT","PSA","NEIGHBORHOODCL USTER","BUSINESSIMPROVEMENTDISTRICT","VOTING_PRECINCT", "START_DATE","END_DATE" 4104147,"4/16/2013 12:00:00 AM","MIDNIGHT","HOMICIDE","KNIFE","1500 - 1599 BLOCK OF 1ST STREET SW",6,"6D","FIRST",105,9,,"Precinct 127","7/27/2004 8:30:00 PM","7/27/2004 8:30:00 PM" 5047867,"6/5/2013 12:00:00 AM","MIDNIGHT","SEX ABUSE","KNIFE","6500 - 6599 BLOCK OF PINEY BRANCH ROAD NW",4,"4B","FOURTH",402,17,,"Precinct 59","4/15/2005 12:30:00 PM", ● From http://data.octo.dc.gov/
  • 55. Deleted all but 4 columns "SHIFT","OFFENSE","METHOD","DISTRICT" "MIDNIGHT","HOMICIDE","KNIFE","FIRST" "MIDNIGHT","SEX ABUSE","KNIFE","FOURTH" ... "DAY","THEFT/OTHER","OTHERS","SECOND" "MIDNIGHT","SEX ABUSE","OTHERS","THIRD" "MIDNIGHT","SEX ABUSE","OTHERS","THIRD" "EVENING","BURGLARY","OTHERS","FIFTH" ...
  • 56. Method to run crime report
  • 59. Improvements ● Allow user-specified order for values, e.g. FIRST, SECOND, THIRD ● Other means of tabulating ● Keeping track of blank values ● Summing counts in columns ● ...
  • 61. Links This talk: http://www.slideshare.net/pargery/mnh-csv-python ● https://github.com/pargery/csv_utils2 ● Also some notes in http://margerytech.blogspot.com/ Info on Data Structures ● http://rhodesmill.org/brandon/slides/2014-04-pycon/data-structures/ ● http://nedbatchelder.com/text/iter.html DC crime stats ● http://data.octo.dc.gov/ “The data made available here has been modified for use from its original source, which is the Government of the District of Columbia. Neither the District of Columbia Government nor the Office of the Chief Technology Officer (OCTO) makes any claims as to the completeness, accuracy or content of any data contained in this application; makes any representation of any kind, including, but not limited to, warranty of the accuracy or fitness for a particular use; nor are any such warranties to be implied or inferred with respect to the information or data furnished herein. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the web feed is being used at one's own risk."