2. Libraries And ML Scope
ML
Data Gathering
Data Cleaning
Exploring DataBuilding Model
Visualization
2
3. Data Gathering
Beautiful Soup
• Is a Python library for pulling data
out of HTML and XML files. It works
with your favorite parser to provide
idiomatic ways of navigating,
searching, and modifying the parse
tree. It commonly saves
programmers hours or days of work.
Requests
• Is the de facto standard for making
HTTP requests in Python. It abstracts
the complexities of making requests
behind a beautiful, simple API so that
you can focus on interacting with
services and consuming data in your
application.
Pandas
• Is an open source, BSD-licensed
library providing high-performance,
easy-to-use data structures and data
analysis tools for
the Python programming language.
3
4. Data Cleaning 4
NumPy
• Is the fundamental package for scientific computing with
Python. It contains among other things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random
number capabilities
Pandas
• Is an open source, BSD-licensed library providing high-
performance, easy-to-use data structures and data analysis
tools for the Python programming language.
5. Exploring Data 5
Seaborn
• is a Python data visualization library
based on matplotlib. It provides a
high-level interface for drawing
attractive and informative statistical
graphics.
Matplotlib.pyplot
• is a collection of command style
functions that make matplotlib work
like MATLAB. Each pyplot function
makes some change to a figure: e.g.,
creates a figure, creates a plotting
area in a figure, plots some lines in a
plotting area, decorates the plot with
labels, etc.
Pandas
• Is an open source, BSD-licensed
library providing high-performance,
easy-to-use data structures and data
analysis tools for
the Python programming language.
6. Building Model 6
SciKit-learn
• Is an open source machine learning library that
that supports supervised and unsupervised
learning. It also provides various tools for
model fitting, data preprocessing, model
selection and evaluation, and many other
utilities.
Statsmodels
• Is a Python module that provides classes and
functions for the estimation of many different
statistical models, as well as for conducting
statistical tests, and statistical data exploration.
An extensive list of result statistics are
available for each estimator.
7. Visualization 7
Seaborn
• is a Python data
visualization library based
on matplotlib. It provides a
high-level interface for
drawing attractive and
informative statistical
graphics.
Matplotlib.pyplot
• is a collection of command
style functions that make
matplotlib work like
MATLAB.
Each pyplot function
makes some change to a
figure: e.g.,
Plotly
• is a web-based toolkit to
form data visualizations.
Plotly can also be accessed
from a Python Notebook
and has a great API.
Geoplotlib
• Is a toolbox for creating
maps and plotting
geographical data. You
can use it to create a
variety of map-types, like
choropleths, heatmaps,
and dot density maps.