SlideShare a Scribd company logo
1 of 15
DATA SCIENCE WITH
PYTHON
PANDAS
Enrollment: 2302031030074
D2D BTECH-IT4
Kevin Patel
BATCH-1
OVERVIEW
➢ Series
➢ DataFrame
➢ Pandas for Time Series
➢ Merging, Joining, Concatenate
➢ Importing data
➢ A simple example
> the python commands will be written here
# this is a comment
2
SET IT UP!
➢ Open a Terminal
➢ Start ipython notebook
➢ Open ipython notebook web-page (localhost:8888)
➢ Open ‘tutorial_pandas.ipynb’
$ ipython notebook
3
PANDAS LIBRARY
The Pandas library provides useful functions to:
➢ Represent and manage data structures
➢ Ease the data processing
➢ With built-in functions to manage (Time) Series
It uses numpy, scipy, matplotlib functions
Manual PDF ONLINE
> import pandas as pd
# to import the pandas library
> pd.__version__
# get the version of the library (0.16)
4
SERIES: DATA STRUCTURE
➢ Unidimensional data structure
➢ Indexing
· automatic
· manual
· ! not univocally !
> data = [1,2,3,4,5]
> s = pd.Series(data)
> s
> s.index
> s = pd.Series(data, index = ['a','b','c','d','d'])
> s['d']
> s[[4]]
# try with: s = pd.Series(data, index = [1,2,3,4,4])
> s.index = [1,2,3,4,5]
5
SERIES: BASIC OPERATIONS
➢ Mathematically, Series are vectors
➢ Compatible with numpy functions
➢ Some basic functions available as pandas methods
➢ Plotting (based on matplotlib)
> import numpy as np
# import numpy to get some mathematical functions
> random_data = np.random.uniform(size=10)
> s = pd.Series(random_data)
> s+1
# try other mathematical functions: **2, *2, exp(s), …
> s.apply(np.log)
> s.mean()
# try other built-in functions. Use 'tab' to discover …
> s.plot() 6
DATAFRAME: DATA STRUCTURE
➢ Bidimensional data structure
➢ A dictionary of Series, with shared index
→ each column is a Series
➢ Indexed, cols and rows (not univocally)
> s1 = pd.Series([1,2,3,4,5], index = list('abcde'))
> data = {'one':s1**s1, 'two':s1+1}
> df = pd.DataFrame(data)
> df.columns
> df.index
# index, columns: assign name (if not existing), or select
> s2 = pd.Series([1,2,3,4,10], index = list('edcbh'))
> df['three'] = s2
# try changing s2 indexes,
7
DATAFRAME: ACCESSING VALUES - 1
➢ keep calm
➢ select columns and rows to obtain Series
➢ query function to select rows
> data = np.random.randn(5,2)
> df = pd.DataFrame(data, index = list('abcde'),
columns = ['one','two'])
> col = df.one
> row = df.xs('b')
# type(col) and type(row) is Series,you know how to manage ...
> df.query('one > 0')
> df.index = [1,2,3,4,5]
> df.query('1 < index < 4')
8
DATAFRAME: ACCESSING VALUES - 2
➢ … madness continues
➢ ix access by index:
works on rows, AND on columns
➢ iloc access by position
➢ you can extract Series
➢ ! define a strategy, and be careful with indexes !
> data = np.random.randn(5,2)
> df = pd.DataFrame(data, index = list('abcde'),
columns = ['one','two'])
> df.ix['a']
# try df.ix[['a', 'b'], 'one'], types
> df.iloc[1,1]
# try df.iloc[1:,1], types?
> df.ix[1:, 'one']
# works as well...
9
DATAFRAME: BASIC OPERATIONS
➢ DataFrames can be considered as Matrixes
➢ Compatible with numpy functions
➢ Some basic functions available as pandas methods
· axis = 0: column-wise
· axis = 1: row-wise
➢ self.apply() function
➢ Plotting (based on matplotlib)
> df_copy = df
# it is a link! Use df_copy = df.copy()
> df * df
> np.exp(df)
> df.mean()
# try df.mean(axis = 1)
# try type(df.mean())
> df.apply(np.mean)
> df.plot()
# try df.transpose().plot()
1
PANDAS FOR TIME SERIES
➢ Used in financial data analysis, we will use for signals
➢ TimeSeries: Series when the index is a timestamp
➢ Pandas functions for Time Series (here)
➢ Useful to select a portion of signal (windowing)
· query method: not available on Series → convert to a DataFrame
> times = np.arange(0, 60, 0.5)
> data = np.random.randn(len(times))
> ts = pd.Series(data, index = times)
> ts.plot()
> epoch = ts[(ts.index > 10) & (ts.index <=20)]
# ts.plot()
# epoch.plot()
> ts_df = pd.DataFrame(ts)
> ts_df.query('10 < index <=20')
1
FEW NOTES ABOUT TIMESTAMPS
➢ Absolute timestamps VS Relative timestamps
· Absolute timestamp is important for synchronization
➢ Unix Timestamps VS date/time representation (converter)
· Unix Timestamp: reference for signal processing
· 0000000000.000000 = 1970, 1st January, 00:00:00.000000
· date/time: easier to understand
· unix timestamp: easier to select/manage
➢ Pandas functions to manage Timestamps
> import datetime
> import time
> now_dt = datetime.datetime.now()
# now_dt = time.ctime()
> now_ut = time.time()
# find out how to convert datetime <--> timestamp
> ts.index = ts.index + now_ut
> ts.index = pd.to_datetime(ts.index, unit = 's')
# ts[(ts.index > -write date time here-)]
> ts.plot()
1
MERGE, JOIN, CONCATENATE
➢ Simple examples here (concatenate, append)
➢ SQL-like functions (join, merge)
➢ Refer to chapter 17 of Pandas Manual
➢ Cookbooks here
> df1 = pd.DataFrame(np.random.randn(6, 3),
columns=['A', 'B', 'C'])
> df2 = pd.DataFrame(np.random.randn(6, 3),
columns=['D', 'E', 'F'])
> df3 = df1.copy()
> df = pd.concat([df1, df2])
> df = df1.append(df2)
# try df = df1.append(df3)
# try df = df1.append(df3, ignore_index = True)
1
IMPORTING DATA
➢ data_df = pd.read_table(FILE,
sep = ',',
skiprows = 5,
header = True,
usecols = [0,1,3],
index_col = 0,
nrows=10)
> FILE = '/path/to/sample_datafile.txt'
> data_df = pd.read_table(...)
# try header = 0, names = ['col1','col2', 'col3']
and adjust skiprows
# try nrows=None
> data_df.plot()
> data = pd.read_table(FILE, sep = ',',
skiprows=[0,1,2,3,4,5,7], header=2, index_col=0)
# empirical solution
> data.plot() 1
SIMPLE FEATURE EXTRACTION EXAMPLE
> import pandas as pd
> WINLEN = 1 # length of window
> WINSTEP = 0.5 # shifting step
> data = pd.read_table(..., usecols=[0,1]) # import data
> t_start = data.index[0] # start first window
> t_end = t_start + WINLEN # end first window
> feat_df = pd.DataFrame() # initialize features df
> while (t_end < data.index[-1]): # cycle
> data_curr = data.query(str(t_start)+'<=index<'+str(t_end))
# extract portion of the signal
> mean_ = data_curr.mean()[0] # extract mean; why [0]?
> sd_ = data_curr.std()[0] # extract …
> feat_row = pd.DataFrame({'mean':mean_, 'sd':sd_},
index=[t_start]) # merge features
> feat_df = feat_df.append(feat_row) # append to features df
1

More Related Content

Similar to Python Panda Library for python programming.ppt

pandas directories on the python language.pptx
pandas directories on the python language.pptxpandas directories on the python language.pptx
pandas directories on the python language.pptx
SumitMajukar
 
XII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptxXII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptx
lekha572836
 
Lecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision MakingLecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision Making
ssuser46aec4
 

Similar to Python Panda Library for python programming.ppt (20)

R programming
R programmingR programming
R programming
 
BA lab1.pptx
BA lab1.pptxBA lab1.pptx
BA lab1.pptx
 
Python Pandas.pptx
Python Pandas.pptxPython Pandas.pptx
Python Pandas.pptx
 
pandas directories on the python language.pptx
pandas directories on the python language.pptxpandas directories on the python language.pptx
pandas directories on the python language.pptx
 
DataFrame Creation.pptx
DataFrame Creation.pptxDataFrame Creation.pptx
DataFrame Creation.pptx
 
Five
FiveFive
Five
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
XII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptxXII IP New PYTHN Python Pandas 2020-21.pptx
XII IP New PYTHN Python Pandas 2020-21.pptx
 
Pandas-(Ziad).pptx
Pandas-(Ziad).pptxPandas-(Ziad).pptx
Pandas-(Ziad).pptx
 
2 pandasbasic
2 pandasbasic2 pandasbasic
2 pandasbasic
 
Clojure basics
Clojure basicsClojure basics
Clojure basics
 
Pandas pythonfordatascience
Pandas pythonfordatasciencePandas pythonfordatascience
Pandas pythonfordatascience
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Python Day1
Python Day1Python Day1
Python Day1
 
PyData NYC 2019
PyData NYC 2019PyData NYC 2019
PyData NYC 2019
 
Lecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision MakingLecture on Python Pandas for Decision Making
Lecture on Python Pandas for Decision Making
 
Sql for dbaspresentation
Sql for dbaspresentationSql for dbaspresentation
Sql for dbaspresentation
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
Task and Data Parallelism
Task and Data ParallelismTask and Data Parallelism
Task and Data Parallelism
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 

Python Panda Library for python programming.ppt

  • 1. DATA SCIENCE WITH PYTHON PANDAS Enrollment: 2302031030074 D2D BTECH-IT4 Kevin Patel BATCH-1
  • 2. OVERVIEW ➢ Series ➢ DataFrame ➢ Pandas for Time Series ➢ Merging, Joining, Concatenate ➢ Importing data ➢ A simple example > the python commands will be written here # this is a comment 2
  • 3. SET IT UP! ➢ Open a Terminal ➢ Start ipython notebook ➢ Open ipython notebook web-page (localhost:8888) ➢ Open ‘tutorial_pandas.ipynb’ $ ipython notebook 3
  • 4. PANDAS LIBRARY The Pandas library provides useful functions to: ➢ Represent and manage data structures ➢ Ease the data processing ➢ With built-in functions to manage (Time) Series It uses numpy, scipy, matplotlib functions Manual PDF ONLINE > import pandas as pd # to import the pandas library > pd.__version__ # get the version of the library (0.16) 4
  • 5. SERIES: DATA STRUCTURE ➢ Unidimensional data structure ➢ Indexing · automatic · manual · ! not univocally ! > data = [1,2,3,4,5] > s = pd.Series(data) > s > s.index > s = pd.Series(data, index = ['a','b','c','d','d']) > s['d'] > s[[4]] # try with: s = pd.Series(data, index = [1,2,3,4,4]) > s.index = [1,2,3,4,5] 5
  • 6. SERIES: BASIC OPERATIONS ➢ Mathematically, Series are vectors ➢ Compatible with numpy functions ➢ Some basic functions available as pandas methods ➢ Plotting (based on matplotlib) > import numpy as np # import numpy to get some mathematical functions > random_data = np.random.uniform(size=10) > s = pd.Series(random_data) > s+1 # try other mathematical functions: **2, *2, exp(s), … > s.apply(np.log) > s.mean() # try other built-in functions. Use 'tab' to discover … > s.plot() 6
  • 7. DATAFRAME: DATA STRUCTURE ➢ Bidimensional data structure ➢ A dictionary of Series, with shared index → each column is a Series ➢ Indexed, cols and rows (not univocally) > s1 = pd.Series([1,2,3,4,5], index = list('abcde')) > data = {'one':s1**s1, 'two':s1+1} > df = pd.DataFrame(data) > df.columns > df.index # index, columns: assign name (if not existing), or select > s2 = pd.Series([1,2,3,4,10], index = list('edcbh')) > df['three'] = s2 # try changing s2 indexes, 7
  • 8. DATAFRAME: ACCESSING VALUES - 1 ➢ keep calm ➢ select columns and rows to obtain Series ➢ query function to select rows > data = np.random.randn(5,2) > df = pd.DataFrame(data, index = list('abcde'), columns = ['one','two']) > col = df.one > row = df.xs('b') # type(col) and type(row) is Series,you know how to manage ... > df.query('one > 0') > df.index = [1,2,3,4,5] > df.query('1 < index < 4') 8
  • 9. DATAFRAME: ACCESSING VALUES - 2 ➢ … madness continues ➢ ix access by index: works on rows, AND on columns ➢ iloc access by position ➢ you can extract Series ➢ ! define a strategy, and be careful with indexes ! > data = np.random.randn(5,2) > df = pd.DataFrame(data, index = list('abcde'), columns = ['one','two']) > df.ix['a'] # try df.ix[['a', 'b'], 'one'], types > df.iloc[1,1] # try df.iloc[1:,1], types? > df.ix[1:, 'one'] # works as well... 9
  • 10. DATAFRAME: BASIC OPERATIONS ➢ DataFrames can be considered as Matrixes ➢ Compatible with numpy functions ➢ Some basic functions available as pandas methods · axis = 0: column-wise · axis = 1: row-wise ➢ self.apply() function ➢ Plotting (based on matplotlib) > df_copy = df # it is a link! Use df_copy = df.copy() > df * df > np.exp(df) > df.mean() # try df.mean(axis = 1) # try type(df.mean()) > df.apply(np.mean) > df.plot() # try df.transpose().plot() 1
  • 11. PANDAS FOR TIME SERIES ➢ Used in financial data analysis, we will use for signals ➢ TimeSeries: Series when the index is a timestamp ➢ Pandas functions for Time Series (here) ➢ Useful to select a portion of signal (windowing) · query method: not available on Series → convert to a DataFrame > times = np.arange(0, 60, 0.5) > data = np.random.randn(len(times)) > ts = pd.Series(data, index = times) > ts.plot() > epoch = ts[(ts.index > 10) & (ts.index <=20)] # ts.plot() # epoch.plot() > ts_df = pd.DataFrame(ts) > ts_df.query('10 < index <=20') 1
  • 12. FEW NOTES ABOUT TIMESTAMPS ➢ Absolute timestamps VS Relative timestamps · Absolute timestamp is important for synchronization ➢ Unix Timestamps VS date/time representation (converter) · Unix Timestamp: reference for signal processing · 0000000000.000000 = 1970, 1st January, 00:00:00.000000 · date/time: easier to understand · unix timestamp: easier to select/manage ➢ Pandas functions to manage Timestamps > import datetime > import time > now_dt = datetime.datetime.now() # now_dt = time.ctime() > now_ut = time.time() # find out how to convert datetime <--> timestamp > ts.index = ts.index + now_ut > ts.index = pd.to_datetime(ts.index, unit = 's') # ts[(ts.index > -write date time here-)] > ts.plot() 1
  • 13. MERGE, JOIN, CONCATENATE ➢ Simple examples here (concatenate, append) ➢ SQL-like functions (join, merge) ➢ Refer to chapter 17 of Pandas Manual ➢ Cookbooks here > df1 = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C']) > df2 = pd.DataFrame(np.random.randn(6, 3), columns=['D', 'E', 'F']) > df3 = df1.copy() > df = pd.concat([df1, df2]) > df = df1.append(df2) # try df = df1.append(df3) # try df = df1.append(df3, ignore_index = True) 1
  • 14. IMPORTING DATA ➢ data_df = pd.read_table(FILE, sep = ',', skiprows = 5, header = True, usecols = [0,1,3], index_col = 0, nrows=10) > FILE = '/path/to/sample_datafile.txt' > data_df = pd.read_table(...) # try header = 0, names = ['col1','col2', 'col3'] and adjust skiprows # try nrows=None > data_df.plot() > data = pd.read_table(FILE, sep = ',', skiprows=[0,1,2,3,4,5,7], header=2, index_col=0) # empirical solution > data.plot() 1
  • 15. SIMPLE FEATURE EXTRACTION EXAMPLE > import pandas as pd > WINLEN = 1 # length of window > WINSTEP = 0.5 # shifting step > data = pd.read_table(..., usecols=[0,1]) # import data > t_start = data.index[0] # start first window > t_end = t_start + WINLEN # end first window > feat_df = pd.DataFrame() # initialize features df > while (t_end < data.index[-1]): # cycle > data_curr = data.query(str(t_start)+'<=index<'+str(t_end)) # extract portion of the signal > mean_ = data_curr.mean()[0] # extract mean; why [0]? > sd_ = data_curr.std()[0] # extract … > feat_row = pd.DataFrame({'mean':mean_, 'sd':sd_}, index=[t_start]) # merge features > feat_df = feat_df.append(feat_row) # append to features df 1

Editor's Notes

  1. Shift-tab for info about the function (2 times for help)