- 1. Python For Data Science
- 2. contents INTRODUCTION : Introduction to Data Science with Python Installing Python, Programming PYTHON BASICS : (basic syntax, data structures data objects, math, comparison operators , condition statements , loops , list, tuple , set , dicts, functions ) NUMPY PACKAGE : Array, selecting data, slicing, array manipulation, stacking, splitting arrays PANDAS PACKAGE : overview, series, and data frame, data manipulation PYTHON advanced : (treating missing values, removing duplicates, grouping, data mugging with pandas histogram) PYTHON advanced : visualization with matplotlib EDA : data cleaning, data wrangling
- 3. What is Python? It is used for: • web development (server-side), • software development, • mathematics, • system scripting. What can Python do? • Python can be used on a server to create web applications. • Python can be used alongside software to create workflows. • Python can connect to database systems. It can also read and modify files. • Python can be used to handle big data and perform complex mathematics. • Python can be used for rapid prototyping, or for production-ready software development.
- 5. Python basics Basic Syntax Python syntax is highly readable. Statements in Python typically end with a new line. () is used to denote line continuation. Python uses indentation to indicate a block of code and gives an error if indentation is skipped. All the continuous lines indented with the same number of spaces form a block. Semicolon ( ; ) allows multiple statements on a single line. A group of individual statements used to make a single code block is called suites.
- 6. Python Comments Python allows in-code documentation by using comments. We can comment portion of the code in two ways. Starting a line with a #: If a #is used at the beginning of a line, Python will consider the rest of the line as a comment. Example
- 7. 2. Using docstrings : 2. Using docstrings: Python docstrings provide extended documentation capabilities. It can be a single line or multiple lines comments. Line or lines to be commented on are started and ended with triple quotes. Example:
- 8. Python Data Types Following are the standard data types in python List Tuple Set Dictionary
- 9. LIST List is a compound data type. It contains items separated by commas and enclosed within square brackets ([]). Items belonging to a list can be of different data type. List is ordered and changeable. List allow duplicate data. List can be created by using the list constructor list(). append() object method is used to add an item to the list. remove() object method is used to remove a specific item from the list. len() method is used to get a count of elements in the list.
- 10. LIST
- 11. Tuples •A tuple is an ordered and unchangeable list or collection. •In Python tuples are written with round brackets. •Items in the tuple are separated by commans. •Can use tuple() constructor to make a tuple. •len() method returns the number of items in a tuple.
- 12. list
- 13. Dictionaries •A dictionary is an unordered collection. •It is changeable and indexed using the key. •Dictionaries are enclosed by curly braces ({ }) and values can be assigned and accessed using square braces ([]) •len() method to returns the number of items. •the dict() constructor can be used to make a dictionary •We can add an item to the dictionary by using a new index key and assigning a value to it. •Elements are stored in a dictionary in a key-value pair and the pair is unique. •We can remove item from a dictionary using the del() function.
- 14. Dictionaries
- 15. Sets A set is an unordered collection It is iterable, mutable and has no duplicate elements. Sets are enclosed by curly braces ({ }). A set can be created using the set constructor. Elements can be added to a set using the add() method. A frozen set is an immutable object which can be created using the frozenset constructor
- 16. Sets
- 17. Python Operators Operators are the constructs used to perform operations on variables and values. Operator Types Arithmetic Operators Comparison or Relational Operators Assignment Operators Logical Operators
- 18. Arithmetic Operators Arithmetic Operators are used with numeric values to perform common mathematical operations. Operators are : + Addition - Subtraction / Division * Multiplication % Modulus ** Exponentiation // Floor Division
- 20. Comparison or Relational Operators Comparison operators are used to compare two values. Operators are: == Equal != Not Equal > Greater than < Less Than <> Not Equal != Not Equal >= Greater Than Equal <= Less Than Equal
- 21. Comparison or Relational Operators
- 22. Assignment Operator Assignment operators are used to assign values to variables. Operators are: = assigns a value to a variable += adds the right operand to the left operand and assigns the result to the left operand -= subtracts the right operand from the left operand and assigns the result to the left operand *= multiply the right operand from the left operand and assign the result to the left operand /= divides the left operand with the right operand and assigns the result to the left operand %= returns the remainder when the left operand is divided by the right operand and assigns the result to the left operand.
- 23. Assignment Operator //= divides left operand with the right operand and assign the floor value result to left operand. **= calculate exponent value using operands and assign the result to the left operand. &= performs AND on operands and assign value to left operand |= performs OR on operands and assign value to left operand ^= performs bitwise XOR on operands and assign value to left operand. >>= performs bitwise right shift on operands and store values on left operand <<= performs bitwise left shift on operands and store values on left operand
- 25. Logical Operators These operators are used to combine conditional statements Operators are: and - returns true if both the statements are true or - returns true if either of the statement is true not reverses the result
- 26. LOOPS AND CONDITIONS LOOPS AND CONDITIONS Conditional Constructs Conditional constructs are used to perform different computations or actions depending on whether the condition evaluates to true or false. The conditions usually uses comparisons and arithmetic expressions with variables. These expressions are evaluated to the Boolean values True or False. The statements for the decision taking are called conditional statements, alternatively known as conditional expressions or constructs. Types of Conditional Statements To understand the use of different conditional constructs in Python. If Statement If .. Else Statement If .. Elseif .. else statement Nested if statement
- 27. If statement If statement The if statements in Python. It is made up of three main components: the if KEYWORD itself, an EXPRESSION that is tested for its true value, a CODE SUITE to execute if the expression evaluates to non zero or true.
- 28. if .. else statement Like other languages, Python features an else statement that can be paired with an if statement. The else statement identifies a block of code to be executed if the conditional expression of the if statement resolves to a false Boolean value.
- 29. If .. elif .. else statement (Chained conditions) elif is the Python else-if statement. It allows one to check multiple expressions for truth value and execute a block of code as soon as one of the conditions evaluates to be true. Like the else statement, the elif statement is optional. Unlike else, there can be an arbitrary number of elif statements following an if.
- 30. Nested If Statements In Python one if condition can also be nested within another if condition. Indentation is the way to figure out the level of nesting
- 31. Continue statement Whenever a continue statement in Python is encountered it re-starts a loop, skipping the following statements in the block. It could be used with both while and for loops.The while loop is conditional and the for loop is iterative, so using continue is subject to same requirements before the next iteration of the loop can begin. Otherwise the loop will terminate normally. Output: Current variable value : 6 Current variable value : 4 Current variable value : 3 Current variable value : 2 Current variable value : 1 Current variable value : 0 Good bye!
- 32. Functions Functions are constructed to structure programs and are useful to utilize code in more than n sections in a program. It increases s reusability of code and removes redundancy. Syntax: def function_name(parameters): function body (statements) The function body consists of indented statements. To end the function body, the inintents to be ended. Every time, a function is called the function body is executed. The parameters in the function definition are optional. A function may have a return statement that returns a result. Once the return statement is executed in the function body the function is ended.
- 34. Numpy package
- 35. Creating Arrays from Python Lists First, we can use np.array to create arrays from Python lists: # integer array: np.array([1, 4, 2, 5, 3]) Out[8]: array([1, 4, 2, 5, 3]) Remember that unlike Python lists, NumPy is constrained to arrays that all containthe same type. If types do not match, NumPy will upcast if possible (here, integers are upcast to floating point): In[9]: np.array([3.14, 4, 2, 3]) Out[9]: array([ 3.14, 4. , 2. , 3. ]) If we want to explicitly set the data type of the resulting array, we can use the dtype keyword: In[10]: np.array([1, 2, 3, 4], dtype='float32') Out[10]: array([ 1., 2., 3., 4.], dtype=float32) Finally, unlike Python lists, NumPy arrays can explicitly be multidimensional; here’sone way of initializing a multidimensional array using a list of lists: In[11]: # nested lists result in multidimensional arrays np.array([range(i, i + 3) for i in [2, 4, 6]]) Out[11]: array([[2, 3, 4], [4, 5, 6], [6, 7, 8]]) The inner lists are treated as rows of the resulting two-dimensional array.
- 36. NumPy Array Attributes First, let’s discuss some useful array attributes. We’ll start by defining three random arrays: a one-dimensional, two-dimensional, and three-dimensional array. We’ll use NumPy’s random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run: In[1]: import NumPy as np np. random.seed(0) # seed for reproducibility x1 = np.random.randint(10, size=6) # One-dimensional array x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array): In[2]: print("x3 ndim: ", x3.ndim) print("x3 shape:", x3.shape) print("x3 size: ", x3.size) x3 ndim: 3 x3 shape: (3, 4, 5) x3 size: 60
- 37. Array Slicing: Accessing Subarray Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice ofan array x, use this: x[start:stop:step] If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1. We’ll take a look at accessing subarrays in one dimension and inmultiple dimensions. One-dimensional subarrays In[16]: x = np.arange(10) x Out[16]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In[17]: x[:5] # first five elements Out[17]: array([0, 1, 2, 3, 4]) In[18]: x[5:] # elements after index 5 Out[18]: array([5, 6, 7, 8, 9]) In[19]: x[4:7] # middle subarray Out[19]: array([4, 5, 6])
- 38. Reshaping of Arrays Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape() method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following: grid = np.arange(1, 10).reshape((3, 3)) print(grid) [[1 2 3] [4 5 6] [7 8 9]]
- 39. Splitting of arrays The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points: x = [1, 2, 3, 99, 99, 3, 2, 1] x1, x2, x3 = np.split(x, [3, 5]) print(x1, x2, x3) [1 2 3] [99 99] [3 2 1] Notice that N split points lead to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar: grid = np.arange(16).reshape((4, 4)) grid array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) In[52]: upper, lower = np.vsplit(grid, [2]) print(upper) print(lower) [[0 1 2 3] [4 5 6 7]]
- 40. [[ 8 9 10 11] [12 13 14 15]] In[53]: left, right = np.hsplit(grid, [2]) print(left) print(right) [[ 0 1] [ 4 5] [ 8 9] [12 13]] [[ 2 3] [ 6 7] [10 11] [14 15]] Similarly, np.dsplit will split arrays along the third axis.
- 41. Pandas At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices. As we will see during the course of this chapter,Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will require an understanding of what these structures are. Thus, before we go any further, let’s introduce these three fundamental Pandas data structures: the Series, DataFrame, and Index. We will start our code sessions with the standard NumPy and Pandas imports: import numpy as np import pandas as pd
- 42. Pandas series
- 47. Combining Datasets: Merge and Join Combining Datasets: Merge and Join One essential feature offered by Pandas is its high-performance, in-memory join and merge operations. If you have ever worked with databases, you should be familiar with this type of data interaction. The main interface for this is the pd. Merge function, and we’ll see a few examples of how this can work in practice. Relational Algebra The behavior implemented in pd. merge() is a subset of what is known as relational algebra, which is a formal set of rules for manipulating relational data, and forms the conceptual foundation of operations available in most databases. The strength of the relational algebra approach is that it proposes several primitive operations, which become the building blocks of more complicated operations on any dataset.
- 49. Visualization with matplotlib Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will use some standard shorthands for Matplotlib imports: In[1]: import matplotlib as mpl import matplotlib.pyplot as plt show() or No show()? How to Display Your Plots A visualization you can’t see won’t be of much use, but just how you view your Matplotlibplots depends on the context. The best use of Matplotlib differs depending on how you are using it; roughly, the three applicable contexts are using Matplotlib in a script, in an IPython terminal, or in an IPython notebook.
- 50. matplotlib Importing matplotlib Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we willuse some standard shorthands for Matplotlib imports: In[1]: import matplotlib as mpl import matplotlib.pyplot as plt The plt interface is what we will use most often, as we’ll see throughout this chapter.