phgv.pptx.pptx

PANDAS
In order to work with pandas the following
statement should be written to import the pandas
library
Padas library to be installed from command
prompt by using the command –pip install pandas
import pandas as p

PANDAS
Benefits:-
1. It offers the facility of reading and
writing different data formats.
2. It extract data from bulky data sets
and even combine multiple datasets.
3. Reshaping of data to different forms.

PANDAS
4. It supports time series
functionality which can predict future
values based on historical data
5. Data visualization is possible by
importing matplotlib library.

PANDAS
Data structure :-It is a way of storing and
organizing data.
Eg:-arrays, stack
The two basic data structures of pandas are
1. Series 2. Data frames

PANDAS
Series is one dimensional,
homogeneous data structure where
values can be changes and size cannot
be changed.
Data frames is 2 dimensional,
heterogeneous data structure where
values can be changes and size is
mutable which means adding/dropping
elements from data frame is possible.

PANDAS
The Series() function imported from
pandas is used to create 1 D array.
It takes two parameters which are
optional. By default First One is data
and second one is index.

PANDAS
import pandas as p
s=p.Series()
print(s)
print(type(s))

PANDAS
OUTPUT
Series([], dtype: float64)
<class 'pandas.core.series.Series'>

PANDAS
import pandas as p
L=[10,20,30,40,50]
print(L)
s=p.Series(L)
print(s)
print(type(s))

PANDAS
Output
[10, 20, 30, 40, 50]
0 10
1 20
2 30
3 40
4 50
dtype: int64

PANDAS
import pandas as p
L=[10,20,30,40,50]
ind=['a','b','c','d','e']
print(L)
s=p.Series(L,index=ind)
print(s)
print(type(s))

PANDAS
Output
[10, 20, 30, 40, 50]
a 10
b 20
c 30
d 40
e 50
dtype: int64

PANDAS
import pandas as p
L=[10,20,30,40,50]
ind=('a','b','c','d','e')
print(L)
s=p.Series(L,index=ind)
print(s)
print(type(s))

PANDAS
[10, 20, 30, 40, 50]
a 10
b 20
c 30
d 40
e 50
dtype: int64

PANDAS
import pandas as p
d={1:"ONE",2:"TWO",3:"THREE"}
ind=('a','b','c','d','e')
print(d)
s=p.Series(d)
print(s)
print(type(s))

PANDAS
Output
{1: 'ONE', 2: 'TWO', 3: 'THREE'}
1 ONE
2 TWO
3 THREE
dtype: object

PANDAS
import pandas as p
st=" i am good"
s=p.Series(st)
print(s)
print(type(s))
st=st.split()
s=p.Series(st)
print(s)

PANDAS
OUTPUT
0 i am good
dtype: object
0 i
1 am
2 good
dtype: object

PANDAS
import pandas as p
import numpy as ny
na=ny.arange(0,5,2)
s=p.Series(na)
print(s)

PANDAS
Output
0 0
1 2
2 4
dtype: int32

PANDAS
import pandas as p
import numpy as np
na=np.arange(10,0,-3)
s=p.Series(na)
print(s)

PANDAS
Output
0 10
1 7
2 4
3 1
dtype: int32

PANDAS
import pandas as p
import numpy as np
s=p.Series(np.arange(1,10,1.5))
print(s)

PANDAS
Output
0 1.0
1 2.5
2 4.0
3 5.5
4 7.0
5 8.5
dtype: float64

PANDAS
• Linspace() of numpy takes three parameters. It
generates values from 1st parameter to final
parameter equally as per the third parameter.
• The decimal values are retrieved in the form
of list.
• No comma separator in the retrieved list

PANDAS
import pandas as p
import numpy as np
na=np.linspace(0,20,5)
print(na)
s=p.Series(na)
print(s)

PANDAS
OUTPUT
[ 0. 5. 10. 15. 20.]
0 0.0
1 5.0
2 10.0
3 15.0
4 20.0
dtype: float64

PANDAS
• TILE() function of numpy retrieves repeated
values of data those many times as per the
number provided as second parameter to
tile() function.
• It retrieves values in the form of list.
• No comma separates values in the list.

PANDAS
import pandas as p
import numpy as np
na=np.tile((10,20),3)
print(na)
s=p.Series(na)
print(s)

PANDAS
OUTPUT
[10 20 10 20 10 20]
0 10
1 20
2 10
3 20
4 10
5 20
dtype: int32

PANDAS
if the key word arguments data and
value are not used then by default
first parameter will be accepted as
data and second parameter will be
accepted as index.

PANDAS
l=[10,20,30,40,50]
i=[0,1,2,3,4]
s=ps.Series(index=l,data=i)
print(s)
s=ps.Series(data=l,index=i)
print(s)
s=ps.Series(l,i)
print(s)

PANDAS
10 0
20 1
30 2
40 3
50 4
dtype: int64
0 10
1 20
2 30
3 40
4 50
dtype: int64
0 10
1 20
2 30
3 40
4 50
dtype: int64

PANDAS
if scalar value is passed, the value will
be repeated as per the index value
provided.

PANDAS
s=ps.Series(1000,index=[1,2,3,4,5])
print(s)
s=ps.Series(500,index=range(1,5,2))
print(s)

PANDAS
1 1000
2 1000
3 1000
4 1000
5 1000
dtype: int64
1 500
3 500
dtype: int64

PANDAS
Dictionary can also be passed as a parameter
to Series(). The keys will be index for the series
and values of the dictionary will be values of
series. It is not necessarily the key value pairs
will be generated in the same order as given in
the dictionary.
d={1:"one",2:"two"}
s=ps.Series(d)
print(s)

PANDAS
Nan/None is legal empty value of
numpy which can be used in the
place of missing value. The datatype
by default is float64. Nan/None will
not support int64 data type
s=ps.Series((1,np.NaN,79))
print(s)

PANDAS
Output:-
0 1.0
1 NaN
2 79.0
dtype: float64

PANDAS
s=ps.Series(data=range(1,5),index=ra
nge(11,15))
print(s)
s=ps.Series(range(0,5),index=[x for x
in "abcde"])
print(s)

PANDAS
11 1
12 2
13 3
14 4
dtype: int64
a 0
b 1
c 2
d 3
e 4
dtype: int64

PANDAS
s=ps.Series(data=[x for x in
range(0,5)],index= [x for x in range(10,15)])
print(s)
s=ps.Series(range(0,5),index=range(0,5))
print(s)
s=ps.Series(range(10,5,-1),range(0,5))
print(s)

PANDAS
10 0
11 1
12 2
13 3
14 4
dtype: int64
0 0
1 1
2 2
3 3
4 4
dtype: int64
0 10
1 9
2 8
3 7
4 6
dtype: int64

PANDAS
mon=["jan","feb","mar","april","may"]
s=ps.Series(data=[10,20,30,40,50],index
=mon,dtype=np.int64)
print(s)

PANDAS
jan 10
feb 20
mar 30
april 40
may 50
dtype: int64

PANDAS
import pandas as ps
import numpy as np
a=np.arange(0,8)
s=ps.Series(index=a,data=a*2)
print(s)
a=np.arange(0,8)
s=ps.Series(a,a*2)
print(s)

PANDAS
For data if list/tuple *2 is written the
values of the list will be doubled. It
works only with numpy arrays.
If data key word is not mentioned and
if the programmer writes a*2 as a
second parameter then the index will
be doubled

PANDAS
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
dtype: int32
0 0
2 1
4 2
6 3
8 4
10 5
12 6
14 7
dtype: int32

If the list or tuple is preceded or followed with
“*” operator then data will be repeated twice
import pandas as ps
import numpy as np
l=[10,20,30]
obj=ps.Series(data=(2*l),dtype=np.float64)
print(obj)

PANDAS
0 10.0
1 20.0
2 30.0
3 10.0
4 20.0
5 30.0
dtype: float64

PANDAS
Indices need not be unique in pandas
in Series object. If the programmer
tries to access value of that repeated
index , then respective values of the
indices are printed.

PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33]
o=ps.Series(l,index=('a','q','a','q'),dty
pe=np.float64)
print(o['q'])
print(o)

PANDAS
q 22.0
q 33.0
dtype: float64
a 11.0
q 22.0
a NaN
q 33.0
dtype: float64

PANDAS
Series object attributes gives the
information about the object.
Seriesobject.index-index
Seriesobject.values-values
Seriesobject.index-dtype-data type
Seriesobject.index-shape-returns a
tuple of the shape of data

PANDAS
Seriesobject.nbytes-number of bytes
Seriesobject.ndim-no of dimensions
Seriesobject.size-returns the number of
elements in the underlying data.
Seriesobject.itemsize-sizeof the dtype of the
item
Seriesobject.hasnans-is there Nanvalue or
not
Seriesobject.empty-returns True if the series
is empty otherwise returns false

PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3)
o=ps.Series(l,(2*i))
print(o)
print(o.index)
print(o.values)
print(o.dtype)
print(o.shape)
print(o.ndim)
print(o.nbytes)
print(o.hasnans)
print(o.empty)

PANDAS
0 11
1 22
2 None
3 33
0 12.5
1 99
2 amma
3 a
dtype: object

PANDAS
Int64Index([0, 1, 2, 3, 0, 1, 2, 3],
dtype='int64')
[11 22 None 33 12.5 99 'amma' 'a']
object
(8,)
1
64
True
False

PANDAS
Object.index()-gives index of the series
Object.values:-Return values of the series as array
Object.dtype:-returns the dtype object of the data
Object.shape:-returns the tuple of the shape of the
data. It tells how big it is including missing or empty
values(NaN)
Object.nbytes:-returns the number of bytes in the
underlying data.
Object.ndim:-returns the number of dimensions of the
underlying data.

PANDAS
• Object.size:-returns the number of elements
of the data.
• Object.itemsize:-returns the size of the dtype
of the item of the underlying data
• Object.hasnans:-returns True if there are any
NaN values other wise returns False
• Object.count()-gives the count of non NaN
values
• Len(object)-gives the total number of values

PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3)
print(o)
print(o.count())
print(len(o))

PANDAS
0 11
1 22
2 None
3 33
0 12.5
1 99
2 amma
3 a
dtype: object
7
8

PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3)
print(o)
print(o.shape)
print(o.ndim)
print(o.size)
print(o.nbytes)

PANDAS
0 11
1 22
2 None
3 33
0 12.5
1 99
2 amma
3 a
dtype: object
(8,)
1
8
64

PANDAS
Series object can be accessed by
using index value as show in the
following program

PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3,4,5,6,7)
o=ps.Series(l,i)
print(o)
print(o[3])

PANDAS
Slices can be extracted from series
object to retrieve sub sets.
In slicing both start and end should be
specified. They represent not indexes
but position. Third parameter is for
updating the index.
Slicing takes place position wise and not
the index wise in a series object.

PANDAS
Series object value also can be
changed by assigning.
Operators can be used on the values
of Series object.

PANDAS
import pandas as ps
import numpy as np
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3,4,5,6,7)
o=ps.Series(l,i)
print(o)
print(o[2:6:1])
print(o[0:2]*2)
print(o)
O[0]=100
print(o)

PANDAS
0 11
1 22
2 None
3 33
4 12.5
5 99
6 amma
7 a
dtype: object
2 None
3 33
4 12.5
5 99
dtype: object
0 22
1 44
dtype: object

PANDAS
0 11
1 22
2 None
3 33
4 12.5
5 99
6 amma
7 a
dtype: object
0 100
1 22
2 None
3 33
4 12.5
5 99
6 amma
7 a
dtype: object

PANDAS
Index of the series can be changed in
the following manner.
o.index=[11,22,33,44,55,66,77,88]

PANDAS
The data series object.head(n) prints
n number of rows from the
beginning.
The data series object.tail(n) prints n
number of rows from the beginning.

PANDAS
import pandas as ps
import numpy as np
n=int(input("enter the no"))
l=[11,22,None,33,12.5,99,"amma",'a']
i=(0,1,2,3,4,5,6,7)
o=ps.Series(l,i)
print(o)
print(o.head(n))
print(o.tail(n))

PANDAS
OUTPUT:-
0 11
1 22
2 None
3 33
4 12.5
5 99
6 amma
7 a
dtype: object
0 11
1 22
dtype: object
6 amma
7 a
dtype: object

PANDAS
Arithmatic operators like ‘+’, ‘-’, ‘*’, ‘/’
can be used on series objects in the
following manner.

PANDAS
import numpy as np
l1=[11,22,33,12.5,99]
i1=(0,1,2,3,4)
l2=[1,2,3,4,5]
i2=[0,1,2,3,4]
o1=ps.Series(l1,i1)
print(o1)
o2=ps.Series(l2,i2)
print(o2)
o3=o1+o2
print(o3)
o3=o1/o2
print(o3)
o3=o1-o2
print(o3)

PANDAS
Data of the series object can be
filtered in the following manner.

PANDAS
import pandas as p
import numpy as n
content=[90,70,80,60,50]
print(content)
o=p.Series(content)
print(o)
print(o>60)
print(o[o>60])

PANDAS
1 70
2 80
3 60
4 50
dtype: int64
0 True
1 True
2 True
3 False
4 False
dtype: bool
0 90
1 70
2 80
dtype: int64

PANDAS
Both values and index of the series can be sorted in the
following manner
import pandas as p
import numpy as n
content=[90,70,80,60,50]
ind=('a','d','z','k','c')
o=p.Series(data=content, index=ind)
print(o)
print(o.sort_values())
print(o.sort_values(ascending=False))
print(o.sort_index())
print(o.sort_index(ascending=False))

PANDAS
NUMPY SERIES OBJECT
VECTORIZED OPERATIONS CAN BE
DONE IF THE SHAPES OF TWO
ARRAYS ARE SAME.
IN SERIES OBJECT OPERATIONS ON
DONE ON MATCHING INDEX
OTHERWISE NAN IS RETURNED.
IN ARRAYS INDEXED ARE ALWAYS
NUMERICS STARTING FROM 0
IN DATA SERIES OBJECT INDEX CAN
START FROM NUMBER, ALPHABET ,
STRINGS ETC.,

phgv.pptx.pptx

Recommended

Recommended

More Related Content

Similar to phgv.pptx.pptx

Similar to phgv.pptx.pptx (20)

Recently uploaded

Recently uploaded (20)

phgv.pptx.pptx