2. Introduction
Aru Bharadwaj is a service-based digital venture established in 2022 in Italy. It offers different services-
• Data preparation
• Machine learning and deep learning model design and development
• Model evaluation and tuning
• Geocomputation and spatial data science
• Forecasting
• Infographics
• General analytics
The company's core initiative is to help businesses with their data. The company has fulfilled five projects and worked
with management consulting firms, marketing agencies, real estate firms and healthcare institutions.
The company has also worked in political campaigns and with governmental and non-governmental organizations.
The company's motto is to provide solutions to real-life problems through data analysis.
3. Project Introduction
This Project is in the collaboration of the client’s FinTech company.
Purpose of the Project is to provide the solid data based on the company’s customer’s credit card
purchasing behaviours from past 1 year with the help of deep learning forecasting.
This process is known as Data cleansing with Python & it’s libraries.
Python libraries used were- NumPy and Panda.
5. Process Description
Here are the basic data cleaning tasks we’ll tackle:
1. Importing Libraries.
2. Input Customer Feedback Dataset
3. Locate Missing Data
4. Check for Duplicates
5. Normalise Casing.
• Data cleaning is the process of correcting or removing corrupt, incorrect, or unnecessary data from a
data set before data analysis.
• Expanding on this basic definition, data cleaning, often grouped with data cleansing, data scrubbing,
and data preparation, serves to turn your messy, potentially problematic data into cleandata.
Importantly,that’s‘cleandata’definedasdatathatthepowerfuldataanalysis engines you spent money
on can actually use.
6. Basic Commands Flow
1. Importing Libraries
INPUT:
import pandas as pd
import numpy as np
2. Input Customer Feedback Dataset
data = pd.read_csv('feedback.csv')
3. Locate Missing Data( 1) Drop the data or, 2) Input missing data.)
data.isnull()
4. Check for Duplicates
INPUT:
data.duplicated()
5. Detect Outliers
data['Rating'].describe()
6. Normalize Casing
data['Review Title'] = data['Review Title'].str.lower()
7. Some Output Results of Each Step for Data Cleansing
Step 5- Customer Name capitalization