SlideShare a Scribd company logo
1 of 3
Download to read offline
Comma Separated Files
This lesson focuses on CSV type les. It gives a complete explanation about how to read data from CSV les using
the Pandas library of Python.
W E ' L L C O V E R T H E F O L L O W I N G
• Introduction to CSV le
• Reading CSV le with Pandas
Introduction to CSV le #
Comma-separated files (CSV) are common in machine learning. These files
have a row of data per line of the file and each line is a comma-separated list
in which each element is a column. Pandas makes it easy to read this data.
Reading CSV le with Pandas #
The documentation can be found here. Before reading a CSV file, there are
three parameters that should be known:
sep - this defaults to a comma, but we can specify anything we want. For
example, CSV format is poor if some of your columns contain commas. A
better option might be a |.
header - which row (if any) have the column names.
names - column names to use.
If your CSV is well-formatted where the first row is the column names, then
the default parameters should work well.
It is important to note that while it might sound simple to read in a CSV file
without Pandas, CSV files are often very messy and reading them
appropriately can often consist of handling many edge cases. Pandas module
handles many of those edge cases right out of the box and has many
parameters that you can change to handle messier CSV files.
Let’s see an example with code.
import pandas as pd
# Define the column names as a list
names = ['age', 'workclass', 'fnlwgt', 'education', 'educationnum', 'maritalstatus', 'occupat
'sex', 'capitalgain', 'capitalloss', 'hoursperweek', 'nativecountry', 'label']
# Read in the CSV file from the webpage using the defined column names
df = pd.read_csv("adult.data", header=None, names=names)
print(df.head())
Let’s deconstruct the code. First, we have a CSV of data that lives at this page.
Go to the page, click on Data Folder and then select adult.data file. It will
download the small dataset for you. Once downloaded, open it in Excel or any
text editor. You will see rows of data with each column separated by commas.
You will notice that this CSV doesn’t have column names. Fortunately, we
know what the names should be and supplied them to the names parameter at
line 2. Since Pandas assumes the first row is the header (columns),
header=None was used at line 7, so it would read the first row as data.
The data is read into a Pandas dataframe.
A Pandas dataframe is very similar to a matrix of data, but with some
additional benefits (usually at the cost of performance). For example, you get
named columns and rows as well as the ability to store different types of data
in each column. For example, one column could be integers and another text.
You might imagine a dataframe as an Excel sheet of data. For example, if you
had a sheet with a column of dates and a column of temperatures on those
dates, you could easily represent it as a dataframe.
We can see the first 5 rows of data by calling the head() function on the
dataframe. In the next section on describing data, we will go more in-depth on
how to use a dataframe.
Next, let’s look at reading in JSON files.
Dealing with files in python specially CSV files

More Related Content

Similar to Dealing with files in python specially CSV files

Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SASEdureka!
 
Reading_csv.pptx
Reading_csv.pptxReading_csv.pptx
Reading_csv.pptxOpOp39
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012Ian Varley
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Base and Advanced SAS Training Contents
Base and Advanced SAS Training ContentsBase and Advanced SAS Training Contents
Base and Advanced SAS Training ContentsGaurav Jain
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandraNavanit Katiyar
 
1.CSV stands for commma seperated values which are applied to move.pdf
1.CSV stands for commma seperated values which are applied to move.pdf1.CSV stands for commma seperated values which are applied to move.pdf
1.CSV stands for commma seperated values which are applied to move.pdfanilart346
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquetManpreet Khurana
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
Introduction to SAS
Introduction to SASIntroduction to SAS
Introduction to SASImam Jaffer
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQLPankaj Khattar
 
Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxSreeLaya9
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRVancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRAllice Shandler
 

Similar to Dealing with files in python specially CSV files (20)

Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SAS
 
The Bund language
The Bund languageThe Bund language
The Bund language
 
Reading_csv.pptx
Reading_csv.pptxReading_csv.pptx
Reading_csv.pptx
 
CSV_FILES.pptx
CSV_FILES.pptxCSV_FILES.pptx
CSV_FILES.pptx
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Base and Advanced SAS Training Contents
Base and Advanced SAS Training ContentsBase and Advanced SAS Training Contents
Base and Advanced SAS Training Contents
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
1.CSV stands for commma seperated values which are applied to move.pdf
1.CSV stands for commma seperated values which are applied to move.pdf1.CSV stands for commma seperated values which are applied to move.pdf
1.CSV stands for commma seperated values which are applied to move.pdf
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquet
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Introduction to SAS
Introduction to SASIntroduction to SAS
Introduction to SAS
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
Stata tutorial university of princeton
Stata tutorial university of princetonStata tutorial university of princeton
Stata tutorial university of princeton
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Management in Python
Data Management in PythonData Management in Python
Data Management in Python
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRVancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
 

Recently uploaded

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 

Recently uploaded (20)

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 

Dealing with files in python specially CSV files

  • 1. Comma Separated Files This lesson focuses on CSV type les. It gives a complete explanation about how to read data from CSV les using the Pandas library of Python. W E ' L L C O V E R T H E F O L L O W I N G • Introduction to CSV le • Reading CSV le with Pandas Introduction to CSV le # Comma-separated files (CSV) are common in machine learning. These files have a row of data per line of the file and each line is a comma-separated list in which each element is a column. Pandas makes it easy to read this data. Reading CSV le with Pandas # The documentation can be found here. Before reading a CSV file, there are three parameters that should be known: sep - this defaults to a comma, but we can specify anything we want. For example, CSV format is poor if some of your columns contain commas. A better option might be a |. header - which row (if any) have the column names. names - column names to use. If your CSV is well-formatted where the first row is the column names, then the default parameters should work well. It is important to note that while it might sound simple to read in a CSV file without Pandas, CSV files are often very messy and reading them appropriately can often consist of handling many edge cases. Pandas module handles many of those edge cases right out of the box and has many
  • 2. parameters that you can change to handle messier CSV files. Let’s see an example with code. import pandas as pd # Define the column names as a list names = ['age', 'workclass', 'fnlwgt', 'education', 'educationnum', 'maritalstatus', 'occupat 'sex', 'capitalgain', 'capitalloss', 'hoursperweek', 'nativecountry', 'label'] # Read in the CSV file from the webpage using the defined column names df = pd.read_csv("adult.data", header=None, names=names) print(df.head()) Let’s deconstruct the code. First, we have a CSV of data that lives at this page. Go to the page, click on Data Folder and then select adult.data file. It will download the small dataset for you. Once downloaded, open it in Excel or any text editor. You will see rows of data with each column separated by commas. You will notice that this CSV doesn’t have column names. Fortunately, we know what the names should be and supplied them to the names parameter at line 2. Since Pandas assumes the first row is the header (columns), header=None was used at line 7, so it would read the first row as data. The data is read into a Pandas dataframe. A Pandas dataframe is very similar to a matrix of data, but with some additional benefits (usually at the cost of performance). For example, you get named columns and rows as well as the ability to store different types of data in each column. For example, one column could be integers and another text. You might imagine a dataframe as an Excel sheet of data. For example, if you had a sheet with a column of dates and a column of temperatures on those dates, you could easily represent it as a dataframe. We can see the first 5 rows of data by calling the head() function on the dataframe. In the next section on describing data, we will go more in-depth on how to use a dataframe. Next, let’s look at reading in JSON files.