Data engineering and analytics using python

Data
Engineering
and Analytics
using Python
PURNA CHANDER RAO. KATHULA

Talking Topics
 Jupyter notebook
 About me
 Python modules for Data Science
 Anaconda
 Pandas
 About pandas
 Data Munging / Data Preparation.
 Demo
 Seaborn
 About seaborn
 Machine Learning
 Linear Regression.

About me..
 Job Title = Architect QA
 Build Tools using Python for QA automation testing .
 Currently Learning

Python modules for Data Science
 Packages used for Data Analysis and Analytics
 Jupyter Notebook
 Pandas
 Numpy
 Scipy
 Matplotlib
 Seaborn
 Scikitlearn

What is Anaconda ?
 Essentially a Large ( ~ 400 MB ) Python Installation.
 But Contains Everything you need for Data Analysis
 Unless you have a special reason not to , you should just install and use this.

About Pandas
 What is Pandas ?
Pandas is a Python library for data analysis and data manipulation. A python version of the R
data.frame library.
 Key Features of Pandas
 It has API’s for loading data from different file formats into memory.
 ( exel, tsv, csv, db and etc).
 Data is structured in the form of Rows and Columns.
 Retrieval of data is similar as SQL, can perform all the operations such as Groupby, Joins, Views and etc..
 Merging of data from multiple datasets.
 Does support much of DataTime series functionality, Timezone, Business Days, Holidays and etc..
 Boolean Indexing
 Fancy Indexing

Core DataStructures of Pandas
 DataFrames
 Series
Core Operations
Create Select Insert Map
Join Sort Clean ApplyMap
View Update Filter Append
Group Summarize Confirm Rotate

Create ( Creating a DataFrame)
View ( Viewing the rows and columns)

View ( Viewing the rows and columns)

Insert ( Adding a new column to dataframe)

Filter ( Slicing and dicing the datframe)

Append (Joining the dataframes based on x-axis=0 )

Concat (Joining the dataframes on Axis = 0 or 1)

Join ( Inner , Left, Right , Outer)

Sort (by columns ascending True or False)

Clean ( Drop, Fillna, duplicates)

Clean ( Fillna ( method=‘ffill / bfill’)

Conform ( reindex() / resample, dropping / NAN as needed)

ReSample (Monthly, Weekly, Yearly)

What is Seaborn?
 Seaborn provides a high-level interface to matplotlib. It provides a high level
interface for drawing attractive statistical graphs.

Demo ( Restaurant Dataset visualization)

Machine Learning ( Linear Regression)
DEMO

Data engineering and analytics using python

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data engineering and analytics using python

Similar to Data engineering and analytics using python (20)

Recently uploaded

Recently uploaded (20)

Data engineering and analytics using python