SlideShare a Scribd company logo
1 of 22
An Introduction To Software
Development Using Python
Spring Semester, 2015
Class #23:
Working With Data
Data Formatting
• In the real world, data comes in many different
shapes, sizes, and encodings.
• This means that you have to know how to
manipulate and transform it into a common format
that will permit efficient processing, sorting, and
storage.
• Python has the tools that will allow you to do all of
this…
Image Credit: publicdomainvectors.org
Your Programming Challenge
• The Florida Polytechnic track team has just been
formed.
• The coach really wants the team to win the state
competition in its first year.
• He’s been recording their training results from the
600m run.
• Now he wants to know the top three fastest times
for each team member.
Image Credit: animals.phillipmartin.info
Here’s What The Data
Looks Like
• James
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
• Julie
2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21
• Mike
2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38
• Sara
2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55
Image Credit: www.dreamstime.com
1st Step: We Need To Get The Data
• Let’s begin by reading the data from each of
the files into its own list.
• Write a short program to process each file,
creating a list for each athlete’s data, and
display the lists on screen.
• Hint: Try splitting the data on the commas,
and don’t forget to strip any unwanted
whitespace.
1 Image Credit: www.clipartillustration.com
New Python Ideas
• data.strip().split(',')
This is called method chaining.
The first method, strip() , is applied to the line in data, which
removes any unwanted whitespace from the string.
Then, the results of the stripping are processed by the second
method, split(',') , creating a list.
The resulting list is then saved in the variable. In this way, the
methods are chained together to produce the required result.
It helps if you read method chains from left to right.
Image Credit: www.clipartpanda.com
Time To Do Some Sorting
• In-Place Sorting
– Takes your data, arranges it in the order you specify, and then replaces your
original data with the sorted version.
– The original ordering is lost. With lists, the sort() method provides in-place
sorting
– Example - original list: [1,3,4,6,2,5]
list after sorting: [1,2,3,4,5,6]
• Copy Sorting
– Takes your data, arranges it in the order you specify, and then returns a sorted
copy of your original data.
– Your original data’s ordering is maintained and only the copy is sorted. In
Python, the sorted() method supports copied sorting.
– Example - original list: [1,3,4,6,2,5]
list after sorting: [1,3,4,6,2,5]
new list: [1,2,3,4,5,6]
2 Image Credit: www.picturesof.net
What’s Our Problem?
• “-”, “.”, and “:” all have different ASCII values.
• This means that they are screwing up our
sort.
• Sara’s data:
['2:58', '2.58', '2:39’, '2-25', '2-55', '2:54’, '2.18', '2:55', '2:55']
• Python sorts the strings, and when it comes
to strings, a dash comes before a period,
which itself comes before a colon.
• Nonuniformity in the coach’s data is causing
the sort to fail.
Fixing The Coach’s Mistakes
• Let’s create a function called sanitize() , which
takes as input a string from each of the
athlete’s lists.
• The function then processes the string to
replace any dashes or colons found with a
period and returns the sanitized string.
• Note: if the string already contains a
period, there’s no need to sanitize it.
3 Image Credit: www.dreamstime.com
Code Problem: Lots and Lots of
Duplication
• Your code creates four lists to hold the data as read
from the data files.
• Then your code creates another four lists to hold the
sanitized data.
• And, of course, you’re stepping through lists all over
the place…
• There has to be a better way to write code like this.
Image Credit: www.canstockphoto.com
Transforming Lists
• Transforming lists is such a common requirement
that Python provides a tool to make the
transformation as painless as possible.
• This tool goes by the rather unwieldy name of
list comprehension.
• List comprehensions are designed to reduce the
amount of code you need to write when
transforming one list into another.
Image Credit: www.fotosearch.com
Steps In Transforming A List
• Consider what you need to do when you transform one list
into another. Four things have to happen. You need to:
1. Create a new list to hold the transformed data.
2. Iterate each data item in the original list.
3. With each iteration, perform the transformation.
4. Append the transformed data to the new list.
clean_sarah = []
for runTime in sarah:
clean_sarah.append(sanitize(runTime))
❶
❷ ❸
❹
Image Credit: www.cakechooser.com
List Comprehension
• Here’s the same functionality as a list comprehension, which
involves creating a new list by specifying the transformation
that is to be applied to each of the data items within an
existing list.
clean_sarah = [sanitize(runTime) for runTime in sarah]
Create new list
… by applying
a transformation
… to each
data item
… within an
existing list
Note: that the transformation has been reduced to a single line
of code. Additionally, there’s no need to specify the use of the append()
method as this action is implied within the list comprehension
4 Image Credit: www.clipartpanda.com
Congratulations!
• You’ve written a program that reads the
Coach’s data from his data files, stores his raw
data in lists, sanitizes the data to a uniform
format, and then sorts and displays the
coach’s data on screen. And all in ~25 lines of
code.
• It’s probably safe to show
the coach your output now.
Image Credit: vector-magz.com
Ooops – Forgot Why We Were
Doing All Of This: Top 3 Times
• We forgot to worry about what we were
actually supposed to be doing: producing the
three fastest times for each athlete.
• Oh, of course, there’s no place for any
duplicated times in our output.
Image Credit: www.clipartpanda.com
Two Ways To Access The
Time Values That We Want
• Standard Notation
– Specify each list item individually
• sara[0]
• sara[1]
• sara[2]
• List Slice
– sara[0:3]
– Access list items up to, but not including, item 3.
Image Credit: www.canstockphoto.com
The Problem With Duplicates
• Do we have a duplicate problem?
• Processing a list to remove duplicates is one area where a list
comprehension can’t help you, because duplicate removal is not a
transformation; it’s more of a filter.
• And a duplicate removal filter needs to examine the list being created as it
is being created, which is not possible with a list comprehension.
• To meet this new requirement, you’ll need to revert to regular list iteration
code.
James
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
5 Image Credit: www.mycutegraphics.com
Remove Duplicates With Sets
• The overriding characteristics of sets in Python are that the data items in a
set are unordered and duplicates are not allowed.
• If you try to add a data item to a set that already contains the data item,
Python simply ignores it.
• It is also possible to create and populate a set in one step. You can provide
a list of data values between curly braces or specify an existing list as an
argument to the set()
• Any duplicates in the james list will be ignored:
distances = set(james)
distances = {10.6, 11, 8, 10.6, "two", 7}
Duplicates will be ignored
Image Credit: www.pinterest.com
What Do We Do Now?
• To extract the data you need, replace all of
that list iteration code in your current program
with four calls to:
sorted(set(...))[0:3]
6 Image Credit: www.fotosearch.com
What’s In Your Python Toolbox?
print() math strings I/O IF/Else elif While For
DictionaryLists And/Or/Not Functions Files ExceptionSets
What We Covered Today
1. Read in data
2. Sorted it
3. Fixed coach’s mistakes
4. Transformed the list
5. Used List Comprehension
6. Used sets to get rid of
duplicates
Image Credit: http://www.tswdj.com/blog/2011/05/17/the-grooms-checklist/
What We’ll Be Covering Next Time
1. External Libraries
2. Data wrangling
Image Credit: http://merchantblog.thefind.com/2011/01/merchant-newsletter/resolve-to-take-advantage-of-these-5-e-commerce-trends/attachment/crystal-ball-fullsize/

More Related Content

What's hot

Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsNelson Auner
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming FundamentalsRagia Ibrahim
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structureeShikshak
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
Searching Techniques and Analysis
Searching Techniques and AnalysisSearching Techniques and Analysis
Searching Techniques and AnalysisAkashBorse2
 
Introduction to the language R
Introduction to the language RIntroduction to the language R
Introduction to the language Rfbenault
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
Datastructures using c++
Datastructures using c++Datastructures using c++
Datastructures using c++Gopi Nath
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationBoris Glavic
 
Data structure lecture 2 (pdf)
Data structure lecture 2 (pdf)Data structure lecture 2 (pdf)
Data structure lecture 2 (pdf)Abbott
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2Abbott
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce worldYu Liu
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining iosrjce
 

What's hot (20)

Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated Documents
 
R - the language
R - the languageR - the language
R - the language
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structure
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
Searching Techniques and Analysis
Searching Techniques and AnalysisSearching Techniques and Analysis
Searching Techniques and Analysis
 
Introduction to the language R
Introduction to the language RIntroduction to the language R
Introduction to the language R
 
Data structure
Data structureData structure
Data structure
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Datastructures using c++
Datastructures using c++Datastructures using c++
Datastructures using c++
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
 
Data structure lecture 2 (pdf)
Data structure lecture 2 (pdf)Data structure lecture 2 (pdf)
Data structure lecture 2 (pdf)
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Fp growth
Fp growthFp growth
Fp growth
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Data structure lecture 2
Data structure lecture 2Data structure lecture 2
Data structure lecture 2
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
A Hybrid Algorithm Using Apriori Growth and Fp-Split Tree For Web Usage Mining
 

Viewers also liked

Viewers also liked (12)

Bubble sorting lab manual
Bubble sorting lab manualBubble sorting lab manual
Bubble sorting lab manual
 
Matlab
MatlabMatlab
Matlab
 
PANCASILA (makalah falsafah pancasila)
PANCASILA (makalah falsafah pancasila) PANCASILA (makalah falsafah pancasila)
PANCASILA (makalah falsafah pancasila)
 
Bin Sorting And Bubble Sort By Luisito G. Trinidad
Bin Sorting And Bubble Sort By Luisito G. TrinidadBin Sorting And Bubble Sort By Luisito G. Trinidad
Bin Sorting And Bubble Sort By Luisito G. Trinidad
 
Sorting bubble-sort anim
Sorting   bubble-sort animSorting   bubble-sort anim
Sorting bubble-sort anim
 
Matlab basic and image
Matlab basic and imageMatlab basic and image
Matlab basic and image
 
PKI dan Kekejaman Terhadap Ulama
PKI dan Kekejaman Terhadap UlamaPKI dan Kekejaman Terhadap Ulama
PKI dan Kekejaman Terhadap Ulama
 
TNI AD Mengenai Komunis
TNI AD Mengenai KomunisTNI AD Mengenai Komunis
TNI AD Mengenai Komunis
 
Matlab Basic Tutorial
Matlab Basic TutorialMatlab Basic Tutorial
Matlab Basic Tutorial
 
Sorting algorithms
Sorting algorithmsSorting algorithms
Sorting algorithms
 
Sorting Algorithms
Sorting AlgorithmsSorting Algorithms
Sorting Algorithms
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Similar to An Introduction To Python - Working With Data

An Introduction To Python - Final Exam Review
An Introduction To Python - Final Exam ReviewAn Introduction To Python - Final Exam Review
An Introduction To Python - Final Exam ReviewBlue Elephant Consulting
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm. Abdul salam
 
Analysis of Algorithms_RR.pptx
Analysis of Algorithms_RR.pptxAnalysis of Algorithms_RR.pptx
Analysis of Algorithms_RR.pptxKarthikR780430
 
Introduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptxIntroduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptxMMCOE, Karvenagar, Pune
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2PoguttuezhiniVP
 
Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structuressonykhan3
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingBigML, Inc
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechniclavparmar007
 
Pycon2015 scope
Pycon2015 scopePycon2015 scope
Pycon2015 scopearthi v
 
ADS Introduction
ADS IntroductionADS Introduction
ADS IntroductionNagendraK18
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure Eman magdy
 
b,Sc it data structure.ppt
b,Sc it data structure.pptb,Sc it data structure.ppt
b,Sc it data structure.pptclassall
 
intership summary
intership summaryintership summary
intership summaryJunting Ma
 
Optimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureOptimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureDAGEOP LTD
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonAfzal Ahmad
 

Similar to An Introduction To Python - Working With Data (20)

An Introduction To Python - Final Exam Review
An Introduction To Python - Final Exam ReviewAn Introduction To Python - Final Exam Review
An Introduction To Python - Final Exam Review
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Python ml
Python mlPython ml
Python ml
 
Analysis of Algorithms_RR.pptx
Analysis of Algorithms_RR.pptxAnalysis of Algorithms_RR.pptx
Analysis of Algorithms_RR.pptx
 
Introduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptxIntroduction to Artificial Intelligence...pptx
Introduction to Artificial Intelligence...pptx
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Algorithms and Data Structures
Algorithms and Data StructuresAlgorithms and Data Structures
Algorithms and Data Structures
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
Data Structure # vpmp polytechnic
Data Structure # vpmp polytechnicData Structure # vpmp polytechnic
Data Structure # vpmp polytechnic
 
Pycon2015 scope
Pycon2015 scopePycon2015 scope
Pycon2015 scope
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
ADS Introduction
ADS IntroductionADS Introduction
ADS Introduction
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 
b,Sc it data structure.ppt
b,Sc it data structure.pptb,Sc it data structure.ppt
b,Sc it data structure.ppt
 
intership summary
intership summaryintership summary
intership summary
 
Optimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureOptimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser Architecture
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In python
 

Recently uploaded

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningMarc Dusseiller Dusjagr
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationNeilDeclaro1
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Recently uploaded (20)

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

An Introduction To Python - Working With Data

  • 1. An Introduction To Software Development Using Python Spring Semester, 2015 Class #23: Working With Data
  • 2. Data Formatting • In the real world, data comes in many different shapes, sizes, and encodings. • This means that you have to know how to manipulate and transform it into a common format that will permit efficient processing, sorting, and storage. • Python has the tools that will allow you to do all of this… Image Credit: publicdomainvectors.org
  • 3. Your Programming Challenge • The Florida Polytechnic track team has just been formed. • The coach really wants the team to win the state competition in its first year. • He’s been recording their training results from the 600m run. • Now he wants to know the top three fastest times for each team member. Image Credit: animals.phillipmartin.info
  • 4. Here’s What The Data Looks Like • James 2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22 • Julie 2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21 • Mike 2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38 • Sara 2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55 Image Credit: www.dreamstime.com
  • 5. 1st Step: We Need To Get The Data • Let’s begin by reading the data from each of the files into its own list. • Write a short program to process each file, creating a list for each athlete’s data, and display the lists on screen. • Hint: Try splitting the data on the commas, and don’t forget to strip any unwanted whitespace. 1 Image Credit: www.clipartillustration.com
  • 6. New Python Ideas • data.strip().split(',') This is called method chaining. The first method, strip() , is applied to the line in data, which removes any unwanted whitespace from the string. Then, the results of the stripping are processed by the second method, split(',') , creating a list. The resulting list is then saved in the variable. In this way, the methods are chained together to produce the required result. It helps if you read method chains from left to right. Image Credit: www.clipartpanda.com
  • 7. Time To Do Some Sorting • In-Place Sorting – Takes your data, arranges it in the order you specify, and then replaces your original data with the sorted version. – The original ordering is lost. With lists, the sort() method provides in-place sorting – Example - original list: [1,3,4,6,2,5] list after sorting: [1,2,3,4,5,6] • Copy Sorting – Takes your data, arranges it in the order you specify, and then returns a sorted copy of your original data. – Your original data’s ordering is maintained and only the copy is sorted. In Python, the sorted() method supports copied sorting. – Example - original list: [1,3,4,6,2,5] list after sorting: [1,3,4,6,2,5] new list: [1,2,3,4,5,6] 2 Image Credit: www.picturesof.net
  • 8. What’s Our Problem? • “-”, “.”, and “:” all have different ASCII values. • This means that they are screwing up our sort. • Sara’s data: ['2:58', '2.58', '2:39’, '2-25', '2-55', '2:54’, '2.18', '2:55', '2:55'] • Python sorts the strings, and when it comes to strings, a dash comes before a period, which itself comes before a colon. • Nonuniformity in the coach’s data is causing the sort to fail.
  • 9. Fixing The Coach’s Mistakes • Let’s create a function called sanitize() , which takes as input a string from each of the athlete’s lists. • The function then processes the string to replace any dashes or colons found with a period and returns the sanitized string. • Note: if the string already contains a period, there’s no need to sanitize it. 3 Image Credit: www.dreamstime.com
  • 10. Code Problem: Lots and Lots of Duplication • Your code creates four lists to hold the data as read from the data files. • Then your code creates another four lists to hold the sanitized data. • And, of course, you’re stepping through lists all over the place… • There has to be a better way to write code like this. Image Credit: www.canstockphoto.com
  • 11. Transforming Lists • Transforming lists is such a common requirement that Python provides a tool to make the transformation as painless as possible. • This tool goes by the rather unwieldy name of list comprehension. • List comprehensions are designed to reduce the amount of code you need to write when transforming one list into another. Image Credit: www.fotosearch.com
  • 12. Steps In Transforming A List • Consider what you need to do when you transform one list into another. Four things have to happen. You need to: 1. Create a new list to hold the transformed data. 2. Iterate each data item in the original list. 3. With each iteration, perform the transformation. 4. Append the transformed data to the new list. clean_sarah = [] for runTime in sarah: clean_sarah.append(sanitize(runTime)) ❶ ❷ ❸ ❹ Image Credit: www.cakechooser.com
  • 13. List Comprehension • Here’s the same functionality as a list comprehension, which involves creating a new list by specifying the transformation that is to be applied to each of the data items within an existing list. clean_sarah = [sanitize(runTime) for runTime in sarah] Create new list … by applying a transformation … to each data item … within an existing list Note: that the transformation has been reduced to a single line of code. Additionally, there’s no need to specify the use of the append() method as this action is implied within the list comprehension 4 Image Credit: www.clipartpanda.com
  • 14. Congratulations! • You’ve written a program that reads the Coach’s data from his data files, stores his raw data in lists, sanitizes the data to a uniform format, and then sorts and displays the coach’s data on screen. And all in ~25 lines of code. • It’s probably safe to show the coach your output now. Image Credit: vector-magz.com
  • 15. Ooops – Forgot Why We Were Doing All Of This: Top 3 Times • We forgot to worry about what we were actually supposed to be doing: producing the three fastest times for each athlete. • Oh, of course, there’s no place for any duplicated times in our output. Image Credit: www.clipartpanda.com
  • 16. Two Ways To Access The Time Values That We Want • Standard Notation – Specify each list item individually • sara[0] • sara[1] • sara[2] • List Slice – sara[0:3] – Access list items up to, but not including, item 3. Image Credit: www.canstockphoto.com
  • 17. The Problem With Duplicates • Do we have a duplicate problem? • Processing a list to remove duplicates is one area where a list comprehension can’t help you, because duplicate removal is not a transformation; it’s more of a filter. • And a duplicate removal filter needs to examine the list being created as it is being created, which is not possible with a list comprehension. • To meet this new requirement, you’ll need to revert to regular list iteration code. James 2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22 5 Image Credit: www.mycutegraphics.com
  • 18. Remove Duplicates With Sets • The overriding characteristics of sets in Python are that the data items in a set are unordered and duplicates are not allowed. • If you try to add a data item to a set that already contains the data item, Python simply ignores it. • It is also possible to create and populate a set in one step. You can provide a list of data values between curly braces or specify an existing list as an argument to the set() • Any duplicates in the james list will be ignored: distances = set(james) distances = {10.6, 11, 8, 10.6, "two", 7} Duplicates will be ignored Image Credit: www.pinterest.com
  • 19. What Do We Do Now? • To extract the data you need, replace all of that list iteration code in your current program with four calls to: sorted(set(...))[0:3] 6 Image Credit: www.fotosearch.com
  • 20. What’s In Your Python Toolbox? print() math strings I/O IF/Else elif While For DictionaryLists And/Or/Not Functions Files ExceptionSets
  • 21. What We Covered Today 1. Read in data 2. Sorted it 3. Fixed coach’s mistakes 4. Transformed the list 5. Used List Comprehension 6. Used sets to get rid of duplicates Image Credit: http://www.tswdj.com/blog/2011/05/17/the-grooms-checklist/
  • 22. What We’ll Be Covering Next Time 1. External Libraries 2. Data wrangling Image Credit: http://merchantblog.thefind.com/2011/01/merchant-newsletter/resolve-to-take-advantage-of-these-5-e-commerce-trends/attachment/crystal-ball-fullsize/

Editor's Notes

  1. New name for the class I know what this means Technical professionals are who get hired This means much more than just having a narrow vertical knowledge of some subject area. It means that you know how to produce an outcome that I value. I’m willing to pay you to do that.