The document provides information about Pandas, a Python library used for data analysis and manipulation. It discusses how to import Pandas, the benefits of using Pandas, basic data structures in Pandas like Series and DataFrames, and examples of creating Series objects from different types of data like lists, dictionaries, and NumPy arrays. It also covers Series object attributes and methods for accessing, slicing, and modifying Series values and indices.
1. PANDAS
In order to work with pandas the following
statement should be written to import the pandas
library
Padas library to be installed from command
prompt by using the command –pip install pandas
import pandas as p
2. PANDAS
Benefits:-
1. It offers the facility of reading and
writing different data formats.
2. It extract data from bulky data sets
and even combine multiple datasets.
3. Reshaping of data to different forms.
3. PANDAS
4. It supports time series
functionality which can predict future
values based on historical data
5. Data visualization is possible by
importing matplotlib library.
4. PANDAS
Data structure :-It is a way of storing and
organizing data.
Eg:-arrays, stack
The two basic data structures of pandas are
1. Series 2. Data frames
5. PANDAS
Series is one dimensional,
homogeneous data structure where
values can be changes and size cannot
be changed.
Data frames is 2 dimensional,
heterogeneous data structure where
values can be changes and size is
mutable which means adding/dropping
elements from data frame is possible.
6. PANDAS
The Series() function imported from
pandas is used to create 1 D array.
It takes two parameters which are
optional. By default First One is data
and second one is index.
25. PANDAS
• Linspace() of numpy takes three parameters. It
generates values from 1st parameter to final
parameter equally as per the third parameter.
• The decimal values are retrieved in the form
of list.
• No comma separator in the retrieved list
26. PANDAS
import pandas as p
import numpy as np
na=np.linspace(0,20,5)
print(na)
s=p.Series(na)
print(s)
28. PANDAS
• TILE() function of numpy retrieves repeated
values of data those many times as per the
number provided as second parameter to
tile() function.
• It retrieves values in the form of list.
• No comma separates values in the list.
29. PANDAS
import pandas as p
import numpy as np
na=np.tile((10,20),3)
print(na)
s=p.Series(na)
print(s)
31. PANDAS
if the key word arguments data and
value are not used then by default
first parameter will be accepted as
data and second parameter will be
accepted as index.
37. PANDAS
Dictionary can also be passed as a parameter
to Series(). The keys will be index for the series
and values of the dictionary will be values of
series. It is not necessarily the key value pairs
will be generated in the same order as given in
the dictionary.
d={1:"one",2:"two"}
s=ps.Series(d)
print(s)
38. PANDAS
Nan/None is legal empty value of
numpy which can be used in the
place of missing value. The datatype
by default is float64. Nan/None will
not support int64 data type
s=ps.Series((1,np.NaN,79))
print(s)
42. PANDAS
s=ps.Series(data=[x for x in
range(0,5)],index= [x for x in range(10,15)])
print(s)
s=ps.Series(range(0,5),index=range(0,5))
print(s)
s=ps.Series(range(10,5,-1),range(0,5))
print(s)
46. PANDAS
import pandas as ps
import numpy as np
a=np.arange(0,8)
s=ps.Series(index=a,data=a*2)
print(s)
a=np.arange(0,8)
s=ps.Series(a,a*2)
print(s)
47. PANDAS
For data if list/tuple *2 is written the
values of the list will be doubled. It
works only with numpy arrays.
If data key word is not mentioned and
if the programmer writes a*2 as a
second parameter then the index will
be doubled
49. If the list or tuple is preceded or followed with
“*” operator then data will be repeated twice
import pandas as ps
import numpy as np
l=[10,20,30]
obj=ps.Series(data=(2*l),dtype=np.float64)
print(obj)
51. PANDAS
Indices need not be unique in pandas
in Series object. If the programmer
tries to access value of that repeated
index , then respective values of the
indices are printed.
52. PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33]
o=ps.Series(l,index=('a','q','a','q'),dty
pe=np.float64)
print(o['q'])
print(o)
54. PANDAS
Series object attributes gives the
information about the object.
Seriesobject.index-index
Seriesobject.values-values
Seriesobject.index-dtype-data type
Seriesobject.index-shape-returns a
tuple of the shape of data
55. PANDAS
Seriesobject.nbytes-number of bytes
Seriesobject.ndim-no of dimensions
Seriesobject.size-returns the number of
elements in the underlying data.
Seriesobject.itemsize-sizeof the dtype of the
item
Seriesobject.hasnans-is there Nanvalue or
not
Seriesobject.empty-returns True if the series
is empty otherwise returns false
59. PANDAS
Object.index()-gives index of the series
Object.values:-Return values of the series as array
Object.dtype:-returns the dtype object of the data
Object.shape:-returns the tuple of the shape of the
data. It tells how big it is including missing or empty
values(NaN)
Object.nbytes:-returns the number of bytes in the
underlying data.
Object.ndim:-returns the number of dimensions of the
underlying data.
60. PANDAS
• Object.size:-returns the number of elements
of the data.
• Object.itemsize:-returns the size of the dtype
of the item of the underlying data
• Object.hasnans:-returns True if there are any
NaN values other wise returns False
• Object.count()-gives the count of non NaN
values
• Len(object)-gives the total number of values
61. PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3)
o=ps.Series(l,(2*i))
print(o)
print(o.count())
print(len(o))
66. PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3,4,5,6,7)
o=ps.Series(l,i)
print(o)
print(o[3])
67. PANDAS
Slices can be extracted from series
object to retrieve sub sets.
In slicing both start and end should be
specified. They represent not indexes
but position. Third parameter is for
updating the index.
Slicing takes place position wise and not
the index wise in a series object.
68. PANDAS
Series object value also can be
changed by assigning.
Operators can be used on the values
of Series object.
69. PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3,4,5,6,7)
o=ps.Series(l,i)
print(o)
print(o[2:6:1])
print(o[0:2]*2)
print(o)
O[0]=100
print(o)
71. PANDAS
0 11
1 22
2 None
3 33
4 12.5
5 99
6 amma
7 a
dtype: object
0 100
1 22
2 None
3 33
4 12.5
5 99
6 amma
7 a
dtype: object
72. PANDAS
Index of the series can be changed in
the following manner.
o.index=[11,22,33,44,55,66,77,88]
73. PANDAS
The data series object.head(n) prints
n number of rows from the
beginning.
The data series object.tail(n) prints n
number of rows from the beginning.
74. PANDAS
import pandas as ps
import numpy as np
n=int(input("enter the no"))
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3,4,5,6,7)
o=ps.Series(l,i)
print(o)
print(o.head(n))
print(o.tail(n))
75. PANDAS
import pandas as ps
import numpy as np
n=int(input("enter the no"))
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3,4,5,6,7)
o=ps.Series(l,i)
print(o)
print(o.head(n))
print(o.tail(n))
76. PANDAS
OUTPUT:-
0 11
1 22
2 None
3 33
4 12.5
5 99
6 amma
7 a
dtype: object
0 11
1 22
dtype: object
6 amma
7 a
dtype: object
82. PANDAS
Both values and index of the series can be sorted in the
following manner
import pandas as p
import numpy as n
content=[90,70,80,60,50]
ind=('a','d','z','k','c')
o=p.Series(data=content, index=ind)
print(o)
print(o.sort_values())
print(o.sort_values(ascending=False))
print(o.sort_index())
print(o.sort_index(ascending=False))
83. PANDAS
NUMPY SERIES OBJECT
VECTORIZED OPERATIONS CAN BE
DONE IF THE SHAPES OF TWO
ARRAYS ARE SAME.
IN SERIES OBJECT OPERATIONS ON
DONE ON MATCHING INDEX
OTHERWISE NAN IS RETURNED.
IN ARRAYS INDEXED ARE ALWAYS
NUMERICS STARTING FROM 0
IN DATA SERIES OBJECT INDEX CAN
START FROM NUMBER, ALPHABET ,
STRINGS ETC.,