This document provides an overview of Python for data science. It discusses key Python features like being an easy to read, flexible language suitable for both new and experienced programmers. It also covers fundamental Python libraries for data scientists like NumPy, SciPy, Pandas, and Scikit-Learn for tasks like numeric computing, machine learning, and data analysis. The document concludes with an introduction to Python's core concepts like variables, data types, operators, and control flow statements.
1. Python in Data Science
Professor Sajjad Abdullah
Government Graduate College, Sadiqabad
2. Features of Python
• Mature programming language (Both for new and experienced)
• Easy for newbies and Easy to read code
• One of the most flexible programming languages
• Code of Less number of Lines
• Can be used both as Procedural and Object Oriented
• Used by data scientists
• Large number of Scientific and other Libraries
• High and vibrant community
3. Features of Python
• Free
• Open Source
• Cross Platform (windows. Linux, mac, raspberry pi)
• Exception Handling
• Automatic Memory Management
• Use in different fields (desktop apps, web apps, mobile apps, ,
game apps , embedded apps, machine learning, data analysis,
scripting)
• Increasing Demand of Python Developers
4. Fundamental Python Libraries for Data
Scientists
• NumPy
• SciPy
• Pandas
• Scikit-Learn
• https://www.python.org/downloads
5. Numeric and Scientific Computation: NumPy
and SciPy
• NumPy is the cornerstone toolbox for scientific computing with
Python.
• Support for multidimensional arrays.
• Basic operations and useful linear algebra functions on
multidimensional arrays.
• SciPy provides a collection of numerical algorithms and domain-
specific toolboxes (Signal processing, optimization, statistics etc.)
• SciPy includes Matplotlib for data visualization.
6. SCIKIT-Learn: Machine Learning in Python
• A machine learning library built from NumPy, SciPy, and
Matplotlib.
• Data analysis
• Classification
• Regression
• Clustering
• Dimensionality reduction
• Model selection
• Preprocessing
7. PANDAS: Python Data Analysis Library
• High-performance data structures and data analysis tools.
• DataFrame object (can be seen as a spreadsheet)
• Transform any dataset
• High-performance functions for aggregating, merging, and joining
datasets.
• Pandas also has tools for importing and exporting data from different
formats (csv, txt, excel, Sql databases, hdf5)
• Pandas offers handling of missing data and intelligent data alignment.
• Pandas provides a convenient Matplotlib interface.
11. Assignment Statement
• Variables are created by using Assignment Statement.
• shares=150
• price=3+5.0/8.0
• value=shares*price
• print(“answer=“,value)
• Augmented Statement Example
• a += v
12. Rules for Variable Naming
• Must start with letter or underscore only.
• Can include alphanumeric letters and underscore only.
• Case Sensitive
• Don’t need to declare
13. Assigning values
• Multiple values to Multiple variables
• x, y, z=“Orange”, “Banana”, “Cherry”
• Single Value to Multiple Variables
• x = y = z = “Orange”
14. Data Types in Python
• Data Type define operations on data.
• Built-in data types in Python are: -
• Text - str
• Numeric - int, float, complex
• Sequence – list, tuple, range
• Mapping – dict
• Set – set, fronzenset
• Boolean – bool
• Binary – bytes, bytearray, memoryview
15. Text Data Type - String Literals
• Enclosed within Single OR Double Quotations
• print(‘Hello’) and print(“Hello”) both are equal.
• Can be assigned to variables
• Multiline Strings – Enclosed withing three quotes.
16. Numeric Data Types
• x=1 #int
• y=2.8 #float
• z=1j #complex
• type() function returns data type of variable.
• print(type(x))
17. Sequence Data Types
• list is a collection of items – organized – not fixed length - can be
changed – can store data of any type, allows duplicate values –
subset can be created.
• thislist = [“apple”, “banana”, “cherry”]
• print (thislist)
• print (thislist[1])
• Negative Indexing - -1 is last item - -2 is second last item…
• print (thislist[2:5]) #only prints items from index no. 2 to 4
18. Sequence Data Types
• tuple is a collection of items – organized – fixed length – can’t be
changed – can store data of any type, allows duplicate values.
• thistuple = (“apple”, “banana”, “cherry”)
• print (thistuple(1))
19. Mapping Data Types
• dict is a collection of items – unorganized – indexed – non-fixed
length – can be changed – can store data of any type, key-value
based structure.
• thisdict = {“brand”: “food”, “model”: “Mustang”, “year”: 1964}
• print (thisdict)
• x = (thisdict[“model”])
• print (x)
21. set data type
• Collection of elements, unorganized, unindexed, no sequence,
indexing or subsetting is not possible, can store any type of data,
elements can be added or removed, duplicates are not allowed,
arithmetic operations (union, difference) are allowed
• thisset = {“apple”, “banana”, “cherry”}
• thisset.add(“orange”)
• thisset.update([“orange”, “mango”, “grapes”])
• len(thisset)
• thisset.remove(“banana”) OR discard()
22. frozenset data type
• Just like sets, but additions or removals are not allowed.
• cities = frozenset([“Islamabad”, “Karachi”, “Lahore”])