SlideShare a Scribd company logo
1 of 48
Presenter:
Date:
TOPIC: AI and DS
Private and Confidential www.futureconnect.net 1
Private and Confidential www.futureconnect.net 2
AGENDA
UNIT
NAME
TOPICS
Hours
Count
Session
1.DATA
SCIENCE
1. DATA SCIENCE LIBARIES
2. NUMPY
3. PANDAS
4. MATPLOTLIB
5. DATA EXPLORATION
2 2
OBJECTIVES
• Gain knowledge of Data Science Libraries
• To understand Data Science Manipulation Packages
• Demo for Data Exploration using Package
3
Private and Confidential www.futureconnect.net 3
Data Mining
Scrapy
• One of the most popular Python data science libraries, Scrapy helps to build crawling programs
(spider bots) that can retrieve structured data from the web – for example, URLs or contact info.
• It's a great tool for scraping data used in, for example, Python machine learning models.
• Developers use it for gathering data from APIs.
BeautifulSoup
• BeautifulSoup is another really popular library for web crawling and data scraping.
• If you want to collect data that’s available on some website but not via a proper CSV or API,
BeautifulSoup can help you scrape it and arrange it into the format you need.
4
Private and Confidential www.futureconnect.net 4
Data Processing and Modeling
NumPy
• NumPy (Numerical Python) is a perfect tool for scientific computing and performing basic and
advanced array operations.
• The library offers many handy features performing operations on n-arrays and matrices in
Python.
SciPy
• This useful library includes modules for linear algebra, integration, optimization, and statistics.
• Its main functionality was built upon NumPy, so its arrays make use of this library.
• SciPy works great for all kinds of scientific programming projects (science, mathematics, and
engineering
5
Private and Confidential www.futureconnect.net 5
Data Processing and Modeling
Pandas
• Pandas is a library created to help developers work with "labeled" and "relational" data intuitively.
• It's based on two main data structures: "Series" (one-dimensional, like a list of items) and "Data
Frames" (two-dimensional, like a table with multiple columns).
Keras
• Keras is a great library for building neural networks and modeling.
• It's very straightforward to use and provides developers with a good degree of extensibility. The
library takes advantage of other packages, (Theano or TensorFlow) as its backends.
6
Private and Confidential www.futureconnect.net 6
Data Processing and Modeling
SciKit-Learn
• This is an industry-standard for data science projects based in Python.
• Scikits is a group of packages in the SciPy Stack that were created for specific functionalities –
for example, image processing. Scikit-learn uses the math operations of SciPy to expose a
concise interface to the most common machine learning algorithms.
PyTorch
• PyTorch is a framework that is perfect for data scientists who want to perform deep learning tasks
easily.
• The tool allows performing tensor computations with GPU acceleration. It's also used for other
tasks – for example, for creating dynamic computational graphs and calculating gradients
automatically.
7
Private and Confidential www.futureconnect.net 7
Data Processing and Modeling
TensorFlow
• TensorFlow is a popular Python framework for machine learning and deep learning, which was
developed at Google Brain.
• It's the best tool for tasks like object identification, speech recognition, and many others.
• It helps in working with artificial neural networks that need to handle multiple data sets.
XGBoost
• This library is used to implement machine learning algorithms under the Gradient Boosting
framework.
• XGBoost is portable, flexible, and efficient.
• It offers parallel tree boosting that helps teams to resolve many data science problems. Another
advantage is that developers can run the same code on major distributed environments such as
Hadoop, SGE, and MPI.
8
Private and Confidential www.futureconnect.net 8
Data Visualization
Matplotlib
• This is a standard data science library that helps to generate data visualizations such as two-
dimensional diagrams and graphs (histograms, scatterplots, non-Cartesian coordinates graphs).
• Matplotlib is one of those plotting libraries that are really useful in data science projects —
it provides an object-oriented API for embedding plots into applications.
• Developers need to write more code than usual while using this library for generating advanced
visualizations.
Seaborn
• Seaborn is based on Matplotlib and serves as a useful Python machine learning tool for
visualizing statistical models – heatmaps and other types of visualizations that summarize data
and depict the overall distributions.
• When using this library, you get to benefit from an extensive gallery of visualizations (including
complex ones like time series, joint plots, and violin diagrams).
9
Private and Confidential www.futureconnect.net 9
Data Visualization
Bokeh
• This library is a great tool for creating interactive and scalable visualizations inside browsers using
JavaScript widgets. Bokeh is fully independent of Matplotlib.
• It focuses on interactivity and presents visualizations through modern browsers – similarly to Data-
Driven Documents (d3.js). It offers a set of graphs, interaction abilities (like linking plots or adding
JavaScript widgets), and styling.
Plotly
• This web-based tool for data visualization that offers many useful out-of-box graphics – you can
find them on the Plot.ly website.
• The library works very well in interactive web applications.
pydot
• This library helps to generate oriented and non-oriented graphs.
• It serves as an interface to Graphviz (written in pure Python). The graphs created come in handy
when you're developing algorithms based on neural networks and decision trees.
10
Private and Confidential www.futureconnect.net 10
Python Libraries for Data Science
• Pandas: Used for structured data operations
• NumPy: Creating Arrays
• Matplotlib: Data Visualization
• Scikit-learn: Machine Learning Operations
• SciPy: Perform Scientific operations
• TensorFlow: Symbolic math library
• BeautifulSoup: Parsing HTML and XML pages
Private and Confidential www.futureconnect.net 11
This 3 Python Libraries will be
covered in the following slides
Numpy
• NumPy=Numerical Python
• Created in 2005 by Travis Oliphant.
• Consist of Array objects and perform array processing.
• NumPy is faster than traditional Python lists as it is stored in one continuous place
in memory.
• The array object in NumPy is called ndarray.
Private and Confidential www.futureconnect.net 12
Top four benefits that NumPy can bring to your code:
1. More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than
seconds.
2. Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration
indices.
3. Clearer code: Without loops, your code will look more like the equations you’re trying to
calculate.
4. Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and
bug free.
13
Private and Confidential www.futureconnect.net 13
Numpy Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install numpy
Import: After installation, import the package by the “import” keyword.
import numpy
This ensures that NumPy package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 14
Numpy-ndarray Object
• It defines the collection of items which belong to same type.
• Each element in ndarray is an object of data-type object : dtype
• Basic ndarray creation: numpy.array
OR
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin =
0)
Array interface Data type Object copying Row/Col major Base class array Number of
or 1D dimensions
Private and Confidential www.futureconnect.net 15
Sample Input-Output
Code:
import numpy as np
a=np.array([1,2,3])
b=np.array([[1,2],[3,4]])
print(a)
print(b)
Output:
[1,2,3]
[[1,2]
[3,4]]
Private and Confidential www.futureconnect.net 16
1D Array
2D Array
NumPy arrays can be multi-dimensional too.
np.array([[1,2,3,4],[5,6,7,8]])
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
• Here, we created a 2-dimensional array of values.
• Note: A matrix is just a rectangular array of numbers with shape N x M where N is
the number of rows and M is the number of columns in the matrix. The one you
just saw above is a 2 x 4 matrix.
17
Private and Confidential www.futureconnect.net 17
Types of NumPy arrays
• Array of zeros
• Array of ones
• Random numbers in ndarrays
• Imatrix in NumPy
• Evenly spaced ndarray
18
Private and Confidential www.futureconnect.net 18
Numpy - Array Indexing and Slicing
• It is used to access array elements by using index element.
• The indexes in NumPy arrays start with 0.
arr = np.array([1, 2, 3, 4])
arr[0] Accessing first element of the array. Hence, the value is 1.
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
arr[0,1] Accessing the second element of the 2D array. Hence, the value is 2.
Slicing: Taking elements of an array from start index to end index [start:end] or [start:step:end]
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5]) Ans: [2 3 4 5]
Private and Confidential www.futureconnect.net 19
Dimensions of NumPy arrays
You can easily determine the number of dimensions or axes of a NumPy array using the ndims attribute:
# number of axis
a = np.array([[5,10,15],[20,25,20]])
print('Array :','n',a)
print('Dimensions :','n',a.ndim)
Array :
[[ 5 10 15]
[20 25 20]]
Dimensions :
2
This array has two dimensions: 2 rows and 3 columns.
20
Private and Confidential www.futureconnect.net 20
Numpy- Array Shape and Reshape
• The shape of an array is the number of data elements in the array.
• It has an attribute called shape to perform the action
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
• Reshaping is done to change the shape of an array.
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)
Output: (2,4)
Output: [[1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]]
Private and Confidential www.futureconnect.net 21
Flattening a NumPy array
Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional
array, you can either use the flatten() method or the ravel() method:
Syntax:
• flatten()
• ravel()
22
Private and Confidential www.futureconnect.net 22
Transpose of a NumPy array
Another very interesting reshaping method of NumPy is the transpose() method. It takes the input
array and swaps the rows with the column values, and the column values with the values of the rows:
Syntax : numpy.transpose()
23
Private and Confidential www.futureconnect.net 23
Expanding and Squeezing a NumPy array
Expanding a NumPy array
• You can add a new axis to an array using the expand_dims() method by providing the array and the
axis along which to expand
Squeezing a NumPy array
• On the other hand, if you instead want to reduce the axis of the array, use the squeeze() method.
• It removes the axis that has a single entry. This means if you have created a 2 x 2 x 1 matrix,
squeeze() will remove the third dimension from the matrix
24
Private and Confidential www.futureconnect.net 24
Numpy- Arrays Join and Split
• Joining means to merge two or more arrays.
• We use concatenate() function to join arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
• Splitting means to breaking one array into many.
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)
Output: [1 2 3 4 5 6]
Output: [array([1,2]),array([3,4]),array([5,6])]
Private and Confidential www.futureconnect.net 25
Pandas
• Data Analysis Tool
• Used for exploring, manipulating, analyzing data.
• The source code for Pandas is found at this github repository
https://github.com/pandas-dev/pandas
• Pandas convert messy data into readable and required format for analysis.
Private and Confidential www.futureconnect.net 26
Pandas Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install pandas
Import: After installation, import the package by the “import” keyword.
import pandas
This ensures that Pandas package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 27
Pandas -Series and Dataframes
• Series is a 1D array containing one type of data
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
• Dataframe is a 2D array containing rows and columns
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
Output: 0 1
1 7
2 2
dtype: int64
Loading data into dataframe Output:
calories duration
0 420 50
1 380 40
2 390 45
Private and Confidential www.futureconnect.net 28
Pandas: Read CSV
• It is used to read CSV(Comma Separated File).
• pd.read_csv() function is used.
import pandas as pd
df = pd.read_csv('data.csv’)
When we print df, we get first 5 rows and last 5 columns in the data as default
df.head(10) : Print first 10 rows
df.tail(10): Print last 10 rows.
df.info(): Information about the data
Input File:data.csv
File is read and stored as data frame in df variable
Private and Confidential www.futureconnect.net 29
Python Matplotlib
• Graph Plotting Library
• Created by John D. Hunter
• The source code for Matplotlib is located at this github repository
https://github.com/matplotlib/matplotlib
• It makes use of NumPy, the numerical mathematics extension of Python
• The current stable version is 2.2.0 released in January 2018.
Private and Confidential www.futureconnect.net 30
Matplotlib Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install matplotlib
Import: After installation, import the package by the “import” keyword.
import matplotlib
This ensures that Matplotlib package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 31
Matplotlib Pyplot
• Matplotlib utilities comes under the Pyplot submodule as plt shown below:
import matplotlib.pyplot as plt
Now, Pyplot can be referred as plt
• plot() function is used to draw lines from points
• show() function is used to display the graph
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()
Private and Confidential www.futureconnect.net 32
Matplotlib Functions
• xlabel() and ylabel() functions are used to add labels
• subplots() functions to draw multiple plots in one figure
• scatter() function is used to construct scatter plots
• bar() function to draw bar graphs
Scatter Plot
Bar Plot
Private and Confidential www.futureconnect.net 33
DATA EXPLORATION: load data file(s)
Private and Confidential www.futureconnect.net 34
DATA EXPLORATION:load data file(s)
Private and Confidential www.futureconnect.net 35
DATA EXPLORATION:load data file(s)
Private and Confidential www.futureconnect.net 36
DATA EXPLORATION:convert a variable to a
different data type
Private and Confidential www.futureconnect.net 37
DATA EXPLORATION:Transpose a Data set or
dataframe
Private and Confidential www.futureconnect.net 38
DATA EXPLORATION:Sort a Pandas DataFrame
Private and Confidential www.futureconnect.net 39
DATA EXPLORATION: Histogram Plot
Private and Confidential www.futureconnect.net 40
DATA EXPLORATION: Histogram Plot
Private and Confidential www.futureconnect.net 41
DATA EXPLORATION:Scatter Plot
Private and Confidential www.futureconnect.net 42
DATA EXPLORATION:Box Plot
Private and Confidential www.futureconnect.net 43
DATA EXPLORATION:Generate frequency
tables
Private and Confidential www.futureconnect.net 44
DATA EXPLORATION:Sample Dataset
Private and Confidential www.futureconnect.net 45
DATA EXPLORATION:Remove duplicate
values
Private and Confidential www.futureconnect.net 46
DATA EXPLORATION:Group variables
Private and Confidential www.futureconnect.net 47
DATA EXPLORATION:Treat missing values
TREATMENT:
Private and Confidential www.futureconnect.net 48

More Related Content

What's hot

Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationTravis Oliphant
 
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonWes McKinney
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataTravis Oliphant
 
Python for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionPython for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionAhmed Gad
 
Data Analytics Webinar for Aspirants
Data Analytics Webinar for AspirantsData Analytics Webinar for Aspirants
Data Analytics Webinar for AspirantsPrakash Pimpale
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonWes McKinney
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019Travis Oliphant
 
Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Rodrigo Urubatan
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With PythonSarah Guido
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with AnacondaTravis Oliphant
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 

What's hot (19)

Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
 
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
Python for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionPython for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd Edition
 
Data Analytics Webinar for Aspirants
Data Analytics Webinar for AspirantsData Analytics Webinar for Aspirants
Data Analytics Webinar for Aspirants
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
 
Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?Data science in ruby is it possible? is it fast? should we use it?
Data science in ruby is it possible? is it fast? should we use it?
 
Numba lightning
Numba lightningNumba lightning
Numba lightning
 
Analyzing Data With Python
Analyzing Data With PythonAnalyzing Data With Python
Analyzing Data With Python
 
Python for Data Science with Anaconda
Python for Data Science with AnacondaPython for Data Science with Anaconda
Python for Data Science with Anaconda
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 

Similar to Session 2

Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive GuidePython for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guidepriyanka rajput
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptxKashishKashish22
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Rodrigo Urubatan
 
Adarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptxAdarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptxhkabir55
 
ANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptxANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptxShahzadAhmadJoiya3
 
3 python packages
3 python packages3 python packages
3 python packagesFEG
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 

Similar to Session 2 (20)

Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive GuidePython for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Py tables
Py tablesPy tables
Py tables
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptx
 
PyTables
PyTablesPyTables
PyTables
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
 
Adarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptxAdarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptx
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
ANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptxANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptx
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
 
3 python packages
3 python packages3 python packages
3 python packages
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Python for ML
Python for MLPython for ML
Python for ML
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 

Recently uploaded

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 

Recently uploaded (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 

Session 2

  • 1. Presenter: Date: TOPIC: AI and DS Private and Confidential www.futureconnect.net 1
  • 2. Private and Confidential www.futureconnect.net 2 AGENDA UNIT NAME TOPICS Hours Count Session 1.DATA SCIENCE 1. DATA SCIENCE LIBARIES 2. NUMPY 3. PANDAS 4. MATPLOTLIB 5. DATA EXPLORATION 2 2
  • 3. OBJECTIVES • Gain knowledge of Data Science Libraries • To understand Data Science Manipulation Packages • Demo for Data Exploration using Package 3 Private and Confidential www.futureconnect.net 3
  • 4. Data Mining Scrapy • One of the most popular Python data science libraries, Scrapy helps to build crawling programs (spider bots) that can retrieve structured data from the web – for example, URLs or contact info. • It's a great tool for scraping data used in, for example, Python machine learning models. • Developers use it for gathering data from APIs. BeautifulSoup • BeautifulSoup is another really popular library for web crawling and data scraping. • If you want to collect data that’s available on some website but not via a proper CSV or API, BeautifulSoup can help you scrape it and arrange it into the format you need. 4 Private and Confidential www.futureconnect.net 4
  • 5. Data Processing and Modeling NumPy • NumPy (Numerical Python) is a perfect tool for scientific computing and performing basic and advanced array operations. • The library offers many handy features performing operations on n-arrays and matrices in Python. SciPy • This useful library includes modules for linear algebra, integration, optimization, and statistics. • Its main functionality was built upon NumPy, so its arrays make use of this library. • SciPy works great for all kinds of scientific programming projects (science, mathematics, and engineering 5 Private and Confidential www.futureconnect.net 5
  • 6. Data Processing and Modeling Pandas • Pandas is a library created to help developers work with "labeled" and "relational" data intuitively. • It's based on two main data structures: "Series" (one-dimensional, like a list of items) and "Data Frames" (two-dimensional, like a table with multiple columns). Keras • Keras is a great library for building neural networks and modeling. • It's very straightforward to use and provides developers with a good degree of extensibility. The library takes advantage of other packages, (Theano or TensorFlow) as its backends. 6 Private and Confidential www.futureconnect.net 6
  • 7. Data Processing and Modeling SciKit-Learn • This is an industry-standard for data science projects based in Python. • Scikits is a group of packages in the SciPy Stack that were created for specific functionalities – for example, image processing. Scikit-learn uses the math operations of SciPy to expose a concise interface to the most common machine learning algorithms. PyTorch • PyTorch is a framework that is perfect for data scientists who want to perform deep learning tasks easily. • The tool allows performing tensor computations with GPU acceleration. It's also used for other tasks – for example, for creating dynamic computational graphs and calculating gradients automatically. 7 Private and Confidential www.futureconnect.net 7
  • 8. Data Processing and Modeling TensorFlow • TensorFlow is a popular Python framework for machine learning and deep learning, which was developed at Google Brain. • It's the best tool for tasks like object identification, speech recognition, and many others. • It helps in working with artificial neural networks that need to handle multiple data sets. XGBoost • This library is used to implement machine learning algorithms under the Gradient Boosting framework. • XGBoost is portable, flexible, and efficient. • It offers parallel tree boosting that helps teams to resolve many data science problems. Another advantage is that developers can run the same code on major distributed environments such as Hadoop, SGE, and MPI. 8 Private and Confidential www.futureconnect.net 8
  • 9. Data Visualization Matplotlib • This is a standard data science library that helps to generate data visualizations such as two- dimensional diagrams and graphs (histograms, scatterplots, non-Cartesian coordinates graphs). • Matplotlib is one of those plotting libraries that are really useful in data science projects — it provides an object-oriented API for embedding plots into applications. • Developers need to write more code than usual while using this library for generating advanced visualizations. Seaborn • Seaborn is based on Matplotlib and serves as a useful Python machine learning tool for visualizing statistical models – heatmaps and other types of visualizations that summarize data and depict the overall distributions. • When using this library, you get to benefit from an extensive gallery of visualizations (including complex ones like time series, joint plots, and violin diagrams). 9 Private and Confidential www.futureconnect.net 9
  • 10. Data Visualization Bokeh • This library is a great tool for creating interactive and scalable visualizations inside browsers using JavaScript widgets. Bokeh is fully independent of Matplotlib. • It focuses on interactivity and presents visualizations through modern browsers – similarly to Data- Driven Documents (d3.js). It offers a set of graphs, interaction abilities (like linking plots or adding JavaScript widgets), and styling. Plotly • This web-based tool for data visualization that offers many useful out-of-box graphics – you can find them on the Plot.ly website. • The library works very well in interactive web applications. pydot • This library helps to generate oriented and non-oriented graphs. • It serves as an interface to Graphviz (written in pure Python). The graphs created come in handy when you're developing algorithms based on neural networks and decision trees. 10 Private and Confidential www.futureconnect.net 10
  • 11. Python Libraries for Data Science • Pandas: Used for structured data operations • NumPy: Creating Arrays • Matplotlib: Data Visualization • Scikit-learn: Machine Learning Operations • SciPy: Perform Scientific operations • TensorFlow: Symbolic math library • BeautifulSoup: Parsing HTML and XML pages Private and Confidential www.futureconnect.net 11 This 3 Python Libraries will be covered in the following slides
  • 12. Numpy • NumPy=Numerical Python • Created in 2005 by Travis Oliphant. • Consist of Array objects and perform array processing. • NumPy is faster than traditional Python lists as it is stored in one continuous place in memory. • The array object in NumPy is called ndarray. Private and Confidential www.futureconnect.net 12
  • 13. Top four benefits that NumPy can bring to your code: 1. More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than seconds. 2. Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration indices. 3. Clearer code: Without loops, your code will look more like the equations you’re trying to calculate. 4. Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and bug free. 13 Private and Confidential www.futureconnect.net 13
  • 14. Numpy Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install numpy Import: After installation, import the package by the “import” keyword. import numpy This ensures that NumPy package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 14
  • 15. Numpy-ndarray Object • It defines the collection of items which belong to same type. • Each element in ndarray is an object of data-type object : dtype • Basic ndarray creation: numpy.array OR numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0) Array interface Data type Object copying Row/Col major Base class array Number of or 1D dimensions Private and Confidential www.futureconnect.net 15
  • 16. Sample Input-Output Code: import numpy as np a=np.array([1,2,3]) b=np.array([[1,2],[3,4]]) print(a) print(b) Output: [1,2,3] [[1,2] [3,4]] Private and Confidential www.futureconnect.net 16 1D Array 2D Array
  • 17. NumPy arrays can be multi-dimensional too. np.array([[1,2,3,4],[5,6,7,8]]) array([[1, 2, 3, 4], [5, 6, 7, 8]]) • Here, we created a 2-dimensional array of values. • Note: A matrix is just a rectangular array of numbers with shape N x M where N is the number of rows and M is the number of columns in the matrix. The one you just saw above is a 2 x 4 matrix. 17 Private and Confidential www.futureconnect.net 17
  • 18. Types of NumPy arrays • Array of zeros • Array of ones • Random numbers in ndarrays • Imatrix in NumPy • Evenly spaced ndarray 18 Private and Confidential www.futureconnect.net 18
  • 19. Numpy - Array Indexing and Slicing • It is used to access array elements by using index element. • The indexes in NumPy arrays start with 0. arr = np.array([1, 2, 3, 4]) arr[0] Accessing first element of the array. Hence, the value is 1. arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) arr[0,1] Accessing the second element of the 2D array. Hence, the value is 2. Slicing: Taking elements of an array from start index to end index [start:end] or [start:step:end] arr = np.array([1, 2, 3, 4, 5, 6, 7]) print(arr[1:5]) Ans: [2 3 4 5] Private and Confidential www.futureconnect.net 19
  • 20. Dimensions of NumPy arrays You can easily determine the number of dimensions or axes of a NumPy array using the ndims attribute: # number of axis a = np.array([[5,10,15],[20,25,20]]) print('Array :','n',a) print('Dimensions :','n',a.ndim) Array : [[ 5 10 15] [20 25 20]] Dimensions : 2 This array has two dimensions: 2 rows and 3 columns. 20 Private and Confidential www.futureconnect.net 20
  • 21. Numpy- Array Shape and Reshape • The shape of an array is the number of data elements in the array. • It has an attribute called shape to perform the action arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) print(arr.shape) • Reshaping is done to change the shape of an array. arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) newarr = arr.reshape(4, 3) print(newarr) Output: (2,4) Output: [[1 2 3] [4 5 6] [7 8 9] [10 11 12]] Private and Confidential www.futureconnect.net 21
  • 22. Flattening a NumPy array Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional array, you can either use the flatten() method or the ravel() method: Syntax: • flatten() • ravel() 22 Private and Confidential www.futureconnect.net 22
  • 23. Transpose of a NumPy array Another very interesting reshaping method of NumPy is the transpose() method. It takes the input array and swaps the rows with the column values, and the column values with the values of the rows: Syntax : numpy.transpose() 23 Private and Confidential www.futureconnect.net 23
  • 24. Expanding and Squeezing a NumPy array Expanding a NumPy array • You can add a new axis to an array using the expand_dims() method by providing the array and the axis along which to expand Squeezing a NumPy array • On the other hand, if you instead want to reduce the axis of the array, use the squeeze() method. • It removes the axis that has a single entry. This means if you have created a 2 x 2 x 1 matrix, squeeze() will remove the third dimension from the matrix 24 Private and Confidential www.futureconnect.net 24
  • 25. Numpy- Arrays Join and Split • Joining means to merge two or more arrays. • We use concatenate() function to join arrays. arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) arr = np.concatenate((arr1, arr2)) print(arr) • Splitting means to breaking one array into many. arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3) print(newarr) Output: [1 2 3 4 5 6] Output: [array([1,2]),array([3,4]),array([5,6])] Private and Confidential www.futureconnect.net 25
  • 26. Pandas • Data Analysis Tool • Used for exploring, manipulating, analyzing data. • The source code for Pandas is found at this github repository https://github.com/pandas-dev/pandas • Pandas convert messy data into readable and required format for analysis. Private and Confidential www.futureconnect.net 26
  • 27. Pandas Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install pandas Import: After installation, import the package by the “import” keyword. import pandas This ensures that Pandas package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 27
  • 28. Pandas -Series and Dataframes • Series is a 1D array containing one type of data import pandas as pd a = [1, 7, 2] myvar = pd.Series(a) print(myvar) • Dataframe is a 2D array containing rows and columns import pandas as pd data = { "calories": [420, 380, 390], "duration": [50, 40, 45] } df = pd.DataFrame(data) print(df) Output: 0 1 1 7 2 2 dtype: int64 Loading data into dataframe Output: calories duration 0 420 50 1 380 40 2 390 45 Private and Confidential www.futureconnect.net 28
  • 29. Pandas: Read CSV • It is used to read CSV(Comma Separated File). • pd.read_csv() function is used. import pandas as pd df = pd.read_csv('data.csv’) When we print df, we get first 5 rows and last 5 columns in the data as default df.head(10) : Print first 10 rows df.tail(10): Print last 10 rows. df.info(): Information about the data Input File:data.csv File is read and stored as data frame in df variable Private and Confidential www.futureconnect.net 29
  • 30. Python Matplotlib • Graph Plotting Library • Created by John D. Hunter • The source code for Matplotlib is located at this github repository https://github.com/matplotlib/matplotlib • It makes use of NumPy, the numerical mathematics extension of Python • The current stable version is 2.2.0 released in January 2018. Private and Confidential www.futureconnect.net 30
  • 31. Matplotlib Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install matplotlib Import: After installation, import the package by the “import” keyword. import matplotlib This ensures that Matplotlib package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 31
  • 32. Matplotlib Pyplot • Matplotlib utilities comes under the Pyplot submodule as plt shown below: import matplotlib.pyplot as plt Now, Pyplot can be referred as plt • plot() function is used to draw lines from points • show() function is used to display the graph import matplotlib.pyplot as plt import numpy as np xpoints = np.array([0, 6]) ypoints = np.array([0, 250]) plt.plot(xpoints, ypoints) plt.show() Private and Confidential www.futureconnect.net 32
  • 33. Matplotlib Functions • xlabel() and ylabel() functions are used to add labels • subplots() functions to draw multiple plots in one figure • scatter() function is used to construct scatter plots • bar() function to draw bar graphs Scatter Plot Bar Plot Private and Confidential www.futureconnect.net 33
  • 34. DATA EXPLORATION: load data file(s) Private and Confidential www.futureconnect.net 34
  • 35. DATA EXPLORATION:load data file(s) Private and Confidential www.futureconnect.net 35
  • 36. DATA EXPLORATION:load data file(s) Private and Confidential www.futureconnect.net 36
  • 37. DATA EXPLORATION:convert a variable to a different data type Private and Confidential www.futureconnect.net 37
  • 38. DATA EXPLORATION:Transpose a Data set or dataframe Private and Confidential www.futureconnect.net 38
  • 39. DATA EXPLORATION:Sort a Pandas DataFrame Private and Confidential www.futureconnect.net 39
  • 40. DATA EXPLORATION: Histogram Plot Private and Confidential www.futureconnect.net 40
  • 41. DATA EXPLORATION: Histogram Plot Private and Confidential www.futureconnect.net 41
  • 42. DATA EXPLORATION:Scatter Plot Private and Confidential www.futureconnect.net 42
  • 43. DATA EXPLORATION:Box Plot Private and Confidential www.futureconnect.net 43
  • 44. DATA EXPLORATION:Generate frequency tables Private and Confidential www.futureconnect.net 44
  • 45. DATA EXPLORATION:Sample Dataset Private and Confidential www.futureconnect.net 45
  • 46. DATA EXPLORATION:Remove duplicate values Private and Confidential www.futureconnect.net 46
  • 47. DATA EXPLORATION:Group variables Private and Confidential www.futureconnect.net 47
  • 48. DATA EXPLORATION:Treat missing values TREATMENT: Private and Confidential www.futureconnect.net 48