Credit Card Fraud Analysis Using Data Science
Sampriti Chatterjee (Great Learning)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Agenda
Why do we need data science?
1
What is Data science?
2
Why Python is so popular?
4
Life cycle of Data science
3
Install python
5
Statistical visualization on Python user
6
Basic Syntax for Python
7
List,Set.Tuple
8
Data manipulation using Numpy and
Pandas
9
What is machine Learning?
10
Supervised Learning: Logistic
Regression
11
Credit card Fraud analysis using
Python
12
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Why do we need Data Science?
• In the past, we used to have data in a structured
format but now as the volume of the data is increasing,
so the number of structured data becomes very less,
so to handle the massive amount of data we need data
science techniques
• Those data can be used to get the proper business
insights and the hidden trends from them.
• These insights helps the organization to predict the
Future
• Using data science decision making can be faster and
effective
• Helps to reduce the production cost
• Build model based on the data to give the ability to the
machine to predicts on its own
How data science is
effecting our
everyday life?
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
What is Data Science?
Data science is a process to get some meaningful
information from the massive amount of data. In simple
terms, read and study the data to get proper intuitive
insights. Data Science is a mixture of various tools,
algorithms, and machine learning and deep learning
concepts to discover hidden patterns from the raw and
unstructured data
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Life cycle of Data Science?
Understand the business
problem Data Acquisition Data Cleaning
Exploratory data Analysis
Machine Learning Algorithm
Predict your model accuracy
Deploy the model
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Most Popular Programming Languages For Data Science?
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Introduction to Python
High level
Object oriented
Interpreted
Python is a popular high level, object oriented and interpreted language
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
History of Python
But, why the
language called as
Python?
• Python is invented by Guido van Rossum
in 1989
• Rossum used to love watching comedy
movies from late seventies
• He needed a short, unique, and slightly
mysterious name for his language
• In that time he was watching Monty
Python’s Flying Circus and from that
series he decided to keep his language
name python.
• This how Python invented
Important Facts
Inventor of Python
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Graphical user interface
Why should you learn Python?
Web development using
Python
Length of the program is
short
Mathematical computation
can be done easily
1
2
3 Python is simple and beginner
friendly language
4
5
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Why Python is so popular?
1 Largest community for Learners and Collaborators Open source
3 Easy to learn and usable flexibility 4
Huge numbers of Python libraries and
Frame work
2
5 Supports Big Data, Machine Learning and
Cloud computing
6
Supports Automation
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
This is the site to install Python -> https://www.python.org/downloads/
Installing Python
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Site to install Python ->
https://www.jetbrains.com/pycharm/download/#section=mac
Popular IDE for Python: Pycharm
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Anaconda installation site->
https://www.anaconda.com/products/individual
Popular IDE for Python: Anaconda
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Google collaboratory link->
https://colab.research.google.com/notebooks/intro.ipynb
Popular IDE for Python: Google colab
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Statistical measurement on Python user
In recent time it is prominent that Python is one of the most popular language because of it’s simplicity
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Getting started with Python
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
How to Store Data?
How do I
store data?
123
“John”
TRUE
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Need of Variables
Data/Values can be stored in temporary storage spaces called variables
“John”
Student
0x1098ab
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Basic syntax for Python
Variable
1 Variables are used as containers to store the data values
2 Python has no command to declare a variable
3 A variable is created when you assigned a value to it
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
DataTypes in Python
Every variable is associated with a data-type
10, 500 3.14, 15.97 TRUE, FALSE “Sam”, “Matt”
int float Boolean String
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Basic syntax for Python
Datatypes
1 Data types are used to classify or categorize of data items
2
Data types represent value which determines what
operations can be performed on that data
3
In python data types is assigned when you assign the
value to the variable
4
We can check the data type of the variable by calling the
type() function
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Basic syntax for Python Python string
1 Single line string:
2 Multiline string:
3 String as arrays:
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Basic syntax for Python Python string
4 String slicing:
5 String negative indexing:
6 Length of the String:
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Python Libraries
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Data manipulation using Python
Data manipulation is a technique which allows to transform, extract, and filter your data
efficiently with less time.
Main two python libraries are used to
manipulate the data
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Data manipulation using Numpy
Numpy stands for Numerical Python and it is used to perform mathematical and
logical operations on arrays
1 Numpy is a python library
2 Install Numpy: !pip install numpy
3 Import the Library: import numpy as np
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Data manipulation using Pandas
Pandas is a popular data manipulation and analysis library in python which is based
on Numpy
1 Python is a python library built on top of Numpy
2 Install Numpy: !pip install pandas
3 Import the Library: import pandas as pd
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Demo on Numpy and pandas
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Creating a Dataframe
What is a
dataframe?
Panda’s data frame is a two-dimensional data structure which is aligned in a tabular
fashion with rows and columns
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Dataframe In-Built Functions
head()
tail()
shape() describe()
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Dropping Columns
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Dropping Rows
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Machine Learning to build the Model
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
What is Machine Learning?
Machine learning is a sub-set of artificial intelligence (AI) that allows the system to
automatically learn and improve from experience without being explicitly programmed
Training Data Model Building Testing Data
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Traditional Programming
Data
Program
Machine Learning
Output
Model
Data
Output
Traditional Vs Machine Learning
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Types Of Machine Learning
Supervised Learning
Unsupervised
Learning
Reinforcement
Learning
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
What is Supervised Learning?
Supervised learning works as a supervisor or teacher. Basically, In supervised learning, we teach or train the
machine with labeled data (that means data is already tagged with some predefined class). Then we test our
model with some unknown new set of data and predict the level for them
▪ Learning from the labelled data and applying the knowledge to predict the
label of the new data(test data), is known as Supervised Learning
▪ Types of Supervised Learning:
• Linear Regression
• Logistic regression
• Decision Tree
• Random Forest
• Naïve Bayes Classifier
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
What is Logistic Regression?
Logistic regression is also a part of supervised learning classification algorithm. It is used to predict the
probability of a target variable and the nature of target or dependent variable is discrete, so for the
output there will be only two class will be present
▪ The dependent variable is binary in nature so that can be either 1 (stands for
success/yes) or 0 (stands for failure/no).
▪ Logistic regression is also known as sigmoid function
▪ Sigmoid function = 1 / (1 + e^-value)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Credit Card Fraud Analysis Using Python
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Thank You
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited

Credit Card Fraud Analysis Using Data Science (1).pdf

  • 1.
    Credit Card FraudAnalysis Using Data Science Sampriti Chatterjee (Great Learning) Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 2.
    Agenda Why do weneed data science? 1 What is Data science? 2 Why Python is so popular? 4 Life cycle of Data science 3 Install python 5 Statistical visualization on Python user 6 Basic Syntax for Python 7 List,Set.Tuple 8 Data manipulation using Numpy and Pandas 9 What is machine Learning? 10 Supervised Learning: Logistic Regression 11 Credit card Fraud analysis using Python 12 Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 3.
    Why do weneed Data Science? • In the past, we used to have data in a structured format but now as the volume of the data is increasing, so the number of structured data becomes very less, so to handle the massive amount of data we need data science techniques • Those data can be used to get the proper business insights and the hidden trends from them. • These insights helps the organization to predict the Future • Using data science decision making can be faster and effective • Helps to reduce the production cost • Build model based on the data to give the ability to the machine to predicts on its own How data science is effecting our everyday life? Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 4.
    What is DataScience? Data science is a process to get some meaningful information from the massive amount of data. In simple terms, read and study the data to get proper intuitive insights. Data Science is a mixture of various tools, algorithms, and machine learning and deep learning concepts to discover hidden patterns from the raw and unstructured data Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 5.
    Life cycle ofData Science? Understand the business problem Data Acquisition Data Cleaning Exploratory data Analysis Machine Learning Algorithm Predict your model accuracy Deploy the model Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 6.
    Most Popular ProgrammingLanguages For Data Science? Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 7.
    Introduction to Python Highlevel Object oriented Interpreted Python is a popular high level, object oriented and interpreted language Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 8.
    History of Python But,why the language called as Python? • Python is invented by Guido van Rossum in 1989 • Rossum used to love watching comedy movies from late seventies • He needed a short, unique, and slightly mysterious name for his language • In that time he was watching Monty Python’s Flying Circus and from that series he decided to keep his language name python. • This how Python invented Important Facts Inventor of Python Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 9.
    Graphical user interface Whyshould you learn Python? Web development using Python Length of the program is short Mathematical computation can be done easily 1 2 3 Python is simple and beginner friendly language 4 5 Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 10.
    Why Python isso popular? 1 Largest community for Learners and Collaborators Open source 3 Easy to learn and usable flexibility 4 Huge numbers of Python libraries and Frame work 2 5 Supports Big Data, Machine Learning and Cloud computing 6 Supports Automation Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 11.
    This is thesite to install Python -> https://www.python.org/downloads/ Installing Python Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 12.
    Site to installPython -> https://www.jetbrains.com/pycharm/download/#section=mac Popular IDE for Python: Pycharm Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 13.
    Anaconda installation site-> https://www.anaconda.com/products/individual PopularIDE for Python: Anaconda Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 14.
    Google collaboratory link-> https://colab.research.google.com/notebooks/intro.ipynb PopularIDE for Python: Google colab Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 15.
    Statistical measurement onPython user In recent time it is prominent that Python is one of the most popular language because of it’s simplicity Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 16.
    Getting started withPython Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 17.
    How to StoreData? How do I store data? 123 “John” TRUE Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 18.
    Need of Variables Data/Valuescan be stored in temporary storage spaces called variables “John” Student 0x1098ab Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 19.
    Basic syntax forPython Variable 1 Variables are used as containers to store the data values 2 Python has no command to declare a variable 3 A variable is created when you assigned a value to it Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 20.
    DataTypes in Python Everyvariable is associated with a data-type 10, 500 3.14, 15.97 TRUE, FALSE “Sam”, “Matt” int float Boolean String Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 21.
    Basic syntax forPython Datatypes 1 Data types are used to classify or categorize of data items 2 Data types represent value which determines what operations can be performed on that data 3 In python data types is assigned when you assign the value to the variable 4 We can check the data type of the variable by calling the type() function Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 22.
    Basic syntax forPython Python string 1 Single line string: 2 Multiline string: 3 String as arrays: Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 23.
    Basic syntax forPython Python string 4 String slicing: 5 String negative indexing: 6 Length of the String: Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 24.
    Python Libraries Proprietary content.©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 25.
    Data manipulation usingPython Data manipulation is a technique which allows to transform, extract, and filter your data efficiently with less time. Main two python libraries are used to manipulate the data Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 26.
    Data manipulation usingNumpy Numpy stands for Numerical Python and it is used to perform mathematical and logical operations on arrays 1 Numpy is a python library 2 Install Numpy: !pip install numpy 3 Import the Library: import numpy as np Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 27.
    Data manipulation usingPandas Pandas is a popular data manipulation and analysis library in python which is based on Numpy 1 Python is a python library built on top of Numpy 2 Install Numpy: !pip install pandas 3 Import the Library: import pandas as pd Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 28.
    Demo on Numpyand pandas Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 29.
    Creating a Dataframe Whatis a dataframe? Panda’s data frame is a two-dimensional data structure which is aligned in a tabular fashion with rows and columns Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 30.
    Dataframe In-Built Functions head() tail() shape()describe() Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 31.
    Dropping Columns Proprietary content.©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 32.
    Dropping Rows Proprietary content.©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 33.
    Machine Learning tobuild the Model Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 34.
    What is MachineLearning? Machine learning is a sub-set of artificial intelligence (AI) that allows the system to automatically learn and improve from experience without being explicitly programmed Training Data Model Building Testing Data Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 35.
    Traditional Programming Data Program Machine Learning Output Model Data Output TraditionalVs Machine Learning Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 36.
    Types Of MachineLearning Supervised Learning Unsupervised Learning Reinforcement Learning Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 37.
    What is SupervisedLearning? Supervised learning works as a supervisor or teacher. Basically, In supervised learning, we teach or train the machine with labeled data (that means data is already tagged with some predefined class). Then we test our model with some unknown new set of data and predict the level for them ▪ Learning from the labelled data and applying the knowledge to predict the label of the new data(test data), is known as Supervised Learning ▪ Types of Supervised Learning: • Linear Regression • Logistic regression • Decision Tree • Random Forest • Naïve Bayes Classifier Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 38.
    What is LogisticRegression? Logistic regression is also a part of supervised learning classification algorithm. It is used to predict the probability of a target variable and the nature of target or dependent variable is discrete, so for the output there will be only two class will be present ▪ The dependent variable is binary in nature so that can be either 1 (stands for success/yes) or 0 (stands for failure/no). ▪ Logistic regression is also known as sigmoid function ▪ Sigmoid function = 1 / (1 + e^-value) Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 39.
    Credit Card FraudAnalysis Using Python Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
  • 40.
    Thank You Proprietary content.©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited