Python in Data Science
Professor Sajjad Abdullah
Government Graduate College, Sadiqabad
Features of Python
• Mature programming language (Both for new and experienced)
• Easy for newbies and Easy to read code
• One of the most flexible programming languages
• Code of Less number of Lines
• Can be used both as Procedural and Object Oriented
• Used by data scientists
• Large number of Scientific and other Libraries
• High and vibrant community
Features of Python
• Free
• Open Source
• Cross Platform (windows. Linux, mac, raspberry pi)
• Exception Handling
• Automatic Memory Management
• Use in different fields (desktop apps, web apps, mobile apps, ,
game apps , embedded apps, machine learning, data analysis,
scripting)
• Increasing Demand of Python Developers
Fundamental Python Libraries for Data
Scientists
• NumPy
• SciPy
• Pandas
• Scikit-Learn
• https://www.python.org/downloads
Numeric and Scientific Computation: NumPy
and SciPy
• NumPy is the cornerstone toolbox for scientific computing with
Python.
• Support for multidimensional arrays.
• Basic operations and useful linear algebra functions on
multidimensional arrays.
• SciPy provides a collection of numerical algorithms and domain-
specific toolboxes (Signal processing, optimization, statistics etc.)
• SciPy includes Matplotlib for data visualization.
SCIKIT-Learn: Machine Learning in Python
• A machine learning library built from NumPy, SciPy, and
Matplotlib.
• Data analysis
• Classification
• Regression
• Clustering
• Dimensionality reduction
• Model selection
• Preprocessing
PANDAS: Python Data Analysis Library
• High-performance data structures and data analysis tools.
• DataFrame object (can be seen as a spreadsheet)
• Transform any dataset
• High-performance functions for aggregating, merging, and joining
datasets.
• Pandas also has tools for importing and exporting data from different
formats (csv, txt, excel, Sql databases, hdf5)
• Pandas offers handling of missing data and intelligent data alignment.
• Pandas provides a convenient Matplotlib interface.
Python Language Core
Basic Output
• print(“Hello World”)
Comments
• Starts with #
Assignment Statement
• Variables are created by using Assignment Statement.
• shares=150
• price=3+5.0/8.0
• value=shares*price
• print(“answer=“,value)
• Augmented Statement Example
• a += v
Rules for Variable Naming
• Must start with letter or underscore only.
• Can include alphanumeric letters and underscore only.
• Case Sensitive
• Don’t need to declare
Assigning values
• Multiple values to Multiple variables
• x, y, z=“Orange”, “Banana”, “Cherry”
• Single Value to Multiple Variables
• x = y = z = “Orange”
Data Types in Python
• Data Type define operations on data.
• Built-in data types in Python are: -
• Text - str
• Numeric - int, float, complex
• Sequence – list, tuple, range
• Mapping – dict
• Set – set, fronzenset
• Boolean – bool
• Binary – bytes, bytearray, memoryview
Text Data Type - String Literals
• Enclosed within Single OR Double Quotations
• print(‘Hello’) and print(“Hello”) both are equal.
• Can be assigned to variables
• Multiline Strings – Enclosed withing three quotes.
Numeric Data Types
• x=1 #int
• y=2.8 #float
• z=1j #complex
• type() function returns data type of variable.
• print(type(x))
Sequence Data Types
• list is a collection of items – organized – not fixed length - can be
changed – can store data of any type, allows duplicate values –
subset can be created.
• thislist = [“apple”, “banana”, “cherry”]
• print (thislist)
• print (thislist[1])
• Negative Indexing - -1 is last item - -2 is second last item…
• print (thislist[2:5]) #only prints items from index no. 2 to 4
Sequence Data Types
• tuple is a collection of items – organized – fixed length – can’t be
changed – can store data of any type, allows duplicate values.
• thistuple = (“apple”, “banana”, “cherry”)
• print (thistuple(1))
Mapping Data Types
• dict is a collection of items – unorganized – indexed – non-fixed
length – can be changed – can store data of any type, key-value
based structure.
• thisdict = {“brand”: “food”, “model”: “Mustang”, “year”: 1964}
• print (thisdict)
• x = (thisdict[“model”])
• print (x)
Set Types
• set
• frozenset
set data type
• Collection of elements, unorganized, unindexed, no sequence,
indexing or subsetting is not possible, can store any type of data,
elements can be added or removed, duplicates are not allowed,
arithmetic operations (union, difference) are allowed
• thisset = {“apple”, “banana”, “cherry”}
• thisset.add(“orange”)
• thisset.update([“orange”, “mango”, “grapes”])
• len(thisset)
• thisset.remove(“banana”) OR discard()
frozenset data type
• Just like sets, but additions or removals are not allowed.
• cities = frozenset([“Islamabad”, “Karachi”, “Lahore”])
bool data type
• Stores true or false.
• a = 10>9

Python.pptx

  • 1.
    Python in DataScience Professor Sajjad Abdullah Government Graduate College, Sadiqabad
  • 2.
    Features of Python •Mature programming language (Both for new and experienced) • Easy for newbies and Easy to read code • One of the most flexible programming languages • Code of Less number of Lines • Can be used both as Procedural and Object Oriented • Used by data scientists • Large number of Scientific and other Libraries • High and vibrant community
  • 3.
    Features of Python •Free • Open Source • Cross Platform (windows. Linux, mac, raspberry pi) • Exception Handling • Automatic Memory Management • Use in different fields (desktop apps, web apps, mobile apps, , game apps , embedded apps, machine learning, data analysis, scripting) • Increasing Demand of Python Developers
  • 4.
    Fundamental Python Librariesfor Data Scientists • NumPy • SciPy • Pandas • Scikit-Learn • https://www.python.org/downloads
  • 5.
    Numeric and ScientificComputation: NumPy and SciPy • NumPy is the cornerstone toolbox for scientific computing with Python. • Support for multidimensional arrays. • Basic operations and useful linear algebra functions on multidimensional arrays. • SciPy provides a collection of numerical algorithms and domain- specific toolboxes (Signal processing, optimization, statistics etc.) • SciPy includes Matplotlib for data visualization.
  • 6.
    SCIKIT-Learn: Machine Learningin Python • A machine learning library built from NumPy, SciPy, and Matplotlib. • Data analysis • Classification • Regression • Clustering • Dimensionality reduction • Model selection • Preprocessing
  • 7.
    PANDAS: Python DataAnalysis Library • High-performance data structures and data analysis tools. • DataFrame object (can be seen as a spreadsheet) • Transform any dataset • High-performance functions for aggregating, merging, and joining datasets. • Pandas also has tools for importing and exporting data from different formats (csv, txt, excel, Sql databases, hdf5) • Pandas offers handling of missing data and intelligent data alignment. • Pandas provides a convenient Matplotlib interface.
  • 8.
  • 9.
  • 10.
  • 11.
    Assignment Statement • Variablesare created by using Assignment Statement. • shares=150 • price=3+5.0/8.0 • value=shares*price • print(“answer=“,value) • Augmented Statement Example • a += v
  • 12.
    Rules for VariableNaming • Must start with letter or underscore only. • Can include alphanumeric letters and underscore only. • Case Sensitive • Don’t need to declare
  • 13.
    Assigning values • Multiplevalues to Multiple variables • x, y, z=“Orange”, “Banana”, “Cherry” • Single Value to Multiple Variables • x = y = z = “Orange”
  • 14.
    Data Types inPython • Data Type define operations on data. • Built-in data types in Python are: - • Text - str • Numeric - int, float, complex • Sequence – list, tuple, range • Mapping – dict • Set – set, fronzenset • Boolean – bool • Binary – bytes, bytearray, memoryview
  • 15.
    Text Data Type- String Literals • Enclosed within Single OR Double Quotations • print(‘Hello’) and print(“Hello”) both are equal. • Can be assigned to variables • Multiline Strings – Enclosed withing three quotes.
  • 16.
    Numeric Data Types •x=1 #int • y=2.8 #float • z=1j #complex • type() function returns data type of variable. • print(type(x))
  • 17.
    Sequence Data Types •list is a collection of items – organized – not fixed length - can be changed – can store data of any type, allows duplicate values – subset can be created. • thislist = [“apple”, “banana”, “cherry”] • print (thislist) • print (thislist[1]) • Negative Indexing - -1 is last item - -2 is second last item… • print (thislist[2:5]) #only prints items from index no. 2 to 4
  • 18.
    Sequence Data Types •tuple is a collection of items – organized – fixed length – can’t be changed – can store data of any type, allows duplicate values. • thistuple = (“apple”, “banana”, “cherry”) • print (thistuple(1))
  • 19.
    Mapping Data Types •dict is a collection of items – unorganized – indexed – non-fixed length – can be changed – can store data of any type, key-value based structure. • thisdict = {“brand”: “food”, “model”: “Mustang”, “year”: 1964} • print (thisdict) • x = (thisdict[“model”]) • print (x)
  • 20.
  • 21.
    set data type •Collection of elements, unorganized, unindexed, no sequence, indexing or subsetting is not possible, can store any type of data, elements can be added or removed, duplicates are not allowed, arithmetic operations (union, difference) are allowed • thisset = {“apple”, “banana”, “cherry”} • thisset.add(“orange”) • thisset.update([“orange”, “mango”, “grapes”]) • len(thisset) • thisset.remove(“banana”) OR discard()
  • 22.
    frozenset data type •Just like sets, but additions or removals are not allowed. • cities = frozenset([“Islamabad”, “Karachi”, “Lahore”])
  • 23.
    bool data type •Stores true or false. • a = 10>9