SlideShare a Scribd company logo
1 of 4
Download to read offline
DISTRIBUTED DATA
         MINING IN CREDIT CARD
           FRAUD DETECTION
INTRODUCTION
Credit card transactions grow in number, taking a larger share of any country’s
payment system and this in turn has led to a higher rate of stolen account
numbers and subsequent losses by banks. Hence, improved fraud detection has
become essential to maintain the viability of the country’s payment system.
Banks have used early fraud warning system for some years. Large-scale data-
mining techniques can improve on the state of the art in commercial practice.
Scalable techniques to techniques can improve on the state of the art in
commercial practice.
Scalable techniques to analyze-massive amounts of transaction data that
efficiently compute fraud detectors in a timely manner is an important problem,
especially for e-commerce.
Besides scalability and efficiency, the fraud-detection task exhibits technical
problems that include skewed distribution of training data and non-uniform cost
per error, both of which have not been widely studied in the knowledge-
discovery and data mining community.
In this project, a deep survey is made and evaluates a number of techniques
that address these three main issues concurrently.
Our proposed methods of combining multiple learned fraud detectors under a
“cost model” are general and demonstrable useful; our empirical result
demonstrate that we can significantly reduce loss due to fraud through
distributed data mining of fraud models.


DATA MINING AND MACHINE LEARNING
The aim of data mining is to extract knowledge from large amounts of data. This
knowledge is nontrivial and hidden in the data. Machine learning is often used in
data mining.

DATA MINING - A DEFINITION
It is an Art/Science of uncovering non-trivial, valuable information from a large
database
Its Emphasis is on:
•   Non-obvious (difficult)
•   Useful (cost vs benefit)
•   Large (automatic)
Yet, no rules, provided that the process is efficient in time, space and human
resources.
• Data mining is the process of finding interesting trends or patterns in large
   datasets in order to guide future decisions.
•   Related to exploratory data analysis (area of statistics) and knowledge
    discovery (area in artificial intelligence, machine learning).
•   Data mining is characterized by having VERY LARGE datasets.

DATA MINING VS MACHINE LEARNING:
•   Size: Databases are usually very large so algorithms must scale well
•   Design Purpose: Databases are not usually designed for data mining (but
    for other purposes), and thus, may not have convenient attributes
•   Errors and Noise: Databases almost always contain errors

The aim of machine learning is to adapt to new circumstances, to detect and
extrapolate. A distinction can be made between unsupervised and supervised
machine learning algorithms.

EXISTING SYSTEM
1) In the existing system, there is a lot of credit card fraud transactions
2) More over the attrition rate of the banking are very high wherein the
   payments are not made.
3) There is no proper strategy to evaluate the system in normal database
   systems
    Hence we need to propose a solution for entrancing & determining a large
          customer base, which is possible only through data mining.

PROPOSED SYSTEM
In today’s increasingly electronic society and with the rapid advances of
electronic commerce on the Internet, the use of electronic commerce on the
Internet. The use of credit cards for purchases has become convenient and
necessary.
Credit card transactions have become the de- facto standard for Internet and
Web based e-commerce. The US government estimates that credit card
accounted for approximately us $13 billion in Internet sales during 1998. This
figure is expected to grow rapidly each year.
However, the growing number of credit card transactions provides more
opportunity for thieves to steal credit card numbers and subsequently commit
fraud.
When banks lose money because of credit card fraud, cardholders pay for all of
that loss through higher interest rates, higher fees, and reduced benefits.
Hence, it is in both the bank’s and cardholders’ interest to reduce illegitimate
use of credit cards by early fraud detection.
For many years, the credit card industry has studied computing models for
automated detection system; recently, these models have been the subject of
academic research, especially with respect to e-commerce.
The credit card fraud-detection domain presents a number of challenging
issues for data mining:
•   These are millions of credit card transactions processed each day. Mining
    such massive amounts of data requires highly efficient techniques that scale.
•   The data are highly skewed-many more transactions are legitimate than
    fraudulent.
•   Typical accuracy-based mining techniques can generate highly accurate
    fraud detectors by simply predicting that all transactions are legitimate,
    although this is equivalent to not detecting fraud at all.


Each transaction record has a different dollar amount and thus has a variable
potential loss, rather than a fixed misclassification cost per error type, as is
commonly assumed in cost-based mining techniques.

Our approach addresses the efficiency and scalability issues in several
ways. We divide large data set of labeled transactions (either fraudulent or
legitimate) into smaller subsets apply mining techniques to generate classifiers
In parallel, and combine the resultant base models by metalearning from the
classifiers behavior to generate a metaclassifier.
Our approach treats the classifiers as black boxes so that we can employ a
variety of learning algorithms. Besides extensibility, combining multiple models
computed over all available data produces metaclassifiers that can offset the
loss of predictive performance that usually occurs when mining from data
subsets or sampling.
Furthermore, when we use the learned classifiers (for example, during
transaction authorization), the base classifiers can execute in parallel, with the
metaclassifier then combining their results. So our approach is highly efficient in
generating these models and also relatively efficient in applying them.
Another parallel approach focuses on parallelizing a particular algorithm on a
particular parallel architecture. However, a new algorithm or architecture
requires a substantial amount of parallel-programming work.
Although our architecture- and algorithm-independent approach is not as
efficient as some fine-grained parallelization approaches, it lets users plug
different off-the-shelf learning programs into a parallel and distributed
environment with relation ease and eliminates the need for expensive parallel
hardware.
The proposed system uses a the data mining algorithms to determine the credit
card fraud detection systems
THE FOLLOWING ALGORITHMS ARE USED TO IMPLEMENT THE
     CREDIT CARD FRAUD DETECTION USING DATA MINING
1) Clustering – ‘K’ Means Clustering Algorithm
2) Classification- decision trees
3) Cost estimation- ADA cost algorithm


PROPOSED SYSTEM HARDWARE REQUIREMENTS
HARDWARE
Processor -    PIII or higher processor
RAM        -   128 MB or higher
HDD        -   40 GB or higher
FDD        -   1.44 MB
MONITOR -      LG/SAMSUNG colour
Keyboard
Mouse
ATX Cabinet


SOFTWARE
OPERATING SYSTEM :        WIN 2000/WIN XP/WIN 98
SOFTWARE              :   JDK 1.3 OR HIGHER
DATABASE              :   Oracle 8i


                                  MODULES
1) KNOWLEDGE BASE CREATION

2) DATA ANALYSIS

3) DATA QUERY ANALYSER

4) CLUSTERING – ‘K’ MEANS CLUSTERING

5) CLASSIFICATION – DECISION TREES

6) COST ESTIMATION- ADA COST ALGORITHM

7) RESULT ANALYSER

8) GRAPHS

More Related Content

More from ncct

Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...ncct
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detectorncct
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protectionncct
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammerncct
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1ncct
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projectsncct
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects Bncct
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009ncct
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009ncct
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...ncct
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects Bncct
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S Pncct
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C Tncct
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projectsncct
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computingncct
 
Final Year Engineering Projects
Final  Year  Engineering  ProjectsFinal  Year  Engineering  Projects
Final Year Engineering Projectsncct
 
A S P
A S PA S P
A S Pncct
 
I E E E 2009 A S P
I E E E 2009  A S PI E E E 2009  A S P
I E E E 2009 A S Pncct
 
I E E E 2009 Real Time Projects, I E E E 2009 Live Projects, I E E E 2...
I E E E 2009  Real  Time  Projects,  I E E E 2009  Live  Projects,  I E E E 2...I E E E 2009  Real  Time  Projects,  I E E E 2009  Live  Projects,  I E E E 2...
I E E E 2009 Real Time Projects, I E E E 2009 Live Projects, I E E E 2...ncct
 
Ieee Projects Ieeeprojects
Ieee Projects IeeeprojectsIeee Projects Ieeeprojects
Ieee Projects Ieeeprojectsncct
 

More from ncct (20)

Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...Botminer   Clustering Analysis Of Network Traffic For Protocol  And Structure...
Botminer Clustering Analysis Of Network Traffic For Protocol And Structure...
 
Bot Robo Tanker Sound Detector
Bot Robo  Tanker  Sound DetectorBot Robo  Tanker  Sound Detector
Bot Robo Tanker Sound Detector
 
Distance Protection
Distance ProtectionDistance Protection
Distance Protection
 
Bluetooth Jammer
Bluetooth  JammerBluetooth  Jammer
Bluetooth Jammer
 
Crypkit 1
Crypkit 1Crypkit 1
Crypkit 1
 
I E E E 2009 Java Projects
I E E E 2009  Java  ProjectsI E E E 2009  Java  Projects
I E E E 2009 Java Projects
 
B E Projects M C A Projects B
B E  Projects  M C A  Projects  BB E  Projects  M C A  Projects  B
B E Projects M C A Projects B
 
J2 E E Projects, I E E E Projects 2009
J2 E E  Projects,  I E E E  Projects 2009J2 E E  Projects,  I E E E  Projects 2009
J2 E E Projects, I E E E Projects 2009
 
J2 M E Projects, I E E E Projects 2009
J2 M E  Projects,  I E E E  Projects 2009J2 M E  Projects,  I E E E  Projects 2009
J2 M E Projects, I E E E Projects 2009
 
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...Engineering  College  Projects,  M C A  Projects,  B E  Projects,  B Tech  Pr...
Engineering College Projects, M C A Projects, B E Projects, B Tech Pr...
 
B E M E Projects M C A Projects B
B E  M E  Projects  M C A  Projects  BB E  M E  Projects  M C A  Projects  B
B E M E Projects M C A Projects B
 
I E E E 2009 Java Projects, I E E E 2009 A S P
I E E E 2009  Java  Projects,  I E E E 2009  A S PI E E E 2009  Java  Projects,  I E E E 2009  A S P
I E E E 2009 Java Projects, I E E E 2009 A S P
 
Advantages Of Software Projects N C C T
Advantages Of  Software  Projects  N C C TAdvantages Of  Software  Projects  N C C T
Advantages Of Software Projects N C C T
 
Engineering Projects
Engineering  ProjectsEngineering  Projects
Engineering Projects
 
Software Projects Java Projects Mobile Computing
Software  Projects  Java  Projects  Mobile  ComputingSoftware  Projects  Java  Projects  Mobile  Computing
Software Projects Java Projects Mobile Computing
 
Final Year Engineering Projects
Final  Year  Engineering  ProjectsFinal  Year  Engineering  Projects
Final Year Engineering Projects
 
A S P
A S PA S P
A S P
 
I E E E 2009 A S P
I E E E 2009  A S PI E E E 2009  A S P
I E E E 2009 A S P
 
I E E E 2009 Real Time Projects, I E E E 2009 Live Projects, I E E E 2...
I E E E 2009  Real  Time  Projects,  I E E E 2009  Live  Projects,  I E E E 2...I E E E 2009  Real  Time  Projects,  I E E E 2009  Live  Projects,  I E E E 2...
I E E E 2009 Real Time Projects, I E E E 2009 Live Projects, I E E E 2...
 
Ieee Projects Ieeeprojects
Ieee Projects IeeeprojectsIeee Projects Ieeeprojects
Ieee Projects Ieeeprojects
 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Java Abs Distributed Data Mining In Credit Card Fraud Detection

  • 1. DISTRIBUTED DATA MINING IN CREDIT CARD FRAUD DETECTION INTRODUCTION Credit card transactions grow in number, taking a larger share of any country’s payment system and this in turn has led to a higher rate of stolen account numbers and subsequent losses by banks. Hence, improved fraud detection has become essential to maintain the viability of the country’s payment system. Banks have used early fraud warning system for some years. Large-scale data- mining techniques can improve on the state of the art in commercial practice. Scalable techniques to techniques can improve on the state of the art in commercial practice. Scalable techniques to analyze-massive amounts of transaction data that efficiently compute fraud detectors in a timely manner is an important problem, especially for e-commerce. Besides scalability and efficiency, the fraud-detection task exhibits technical problems that include skewed distribution of training data and non-uniform cost per error, both of which have not been widely studied in the knowledge- discovery and data mining community. In this project, a deep survey is made and evaluates a number of techniques that address these three main issues concurrently. Our proposed methods of combining multiple learned fraud detectors under a “cost model” are general and demonstrable useful; our empirical result demonstrate that we can significantly reduce loss due to fraud through distributed data mining of fraud models. DATA MINING AND MACHINE LEARNING The aim of data mining is to extract knowledge from large amounts of data. This knowledge is nontrivial and hidden in the data. Machine learning is often used in data mining. DATA MINING - A DEFINITION It is an Art/Science of uncovering non-trivial, valuable information from a large database Its Emphasis is on: • Non-obvious (difficult) • Useful (cost vs benefit) • Large (automatic)
  • 2. Yet, no rules, provided that the process is efficient in time, space and human resources. • Data mining is the process of finding interesting trends or patterns in large datasets in order to guide future decisions. • Related to exploratory data analysis (area of statistics) and knowledge discovery (area in artificial intelligence, machine learning). • Data mining is characterized by having VERY LARGE datasets. DATA MINING VS MACHINE LEARNING: • Size: Databases are usually very large so algorithms must scale well • Design Purpose: Databases are not usually designed for data mining (but for other purposes), and thus, may not have convenient attributes • Errors and Noise: Databases almost always contain errors The aim of machine learning is to adapt to new circumstances, to detect and extrapolate. A distinction can be made between unsupervised and supervised machine learning algorithms. EXISTING SYSTEM 1) In the existing system, there is a lot of credit card fraud transactions 2) More over the attrition rate of the banking are very high wherein the payments are not made. 3) There is no proper strategy to evaluate the system in normal database systems Hence we need to propose a solution for entrancing & determining a large customer base, which is possible only through data mining. PROPOSED SYSTEM In today’s increasingly electronic society and with the rapid advances of electronic commerce on the Internet, the use of electronic commerce on the Internet. The use of credit cards for purchases has become convenient and necessary. Credit card transactions have become the de- facto standard for Internet and Web based e-commerce. The US government estimates that credit card accounted for approximately us $13 billion in Internet sales during 1998. This figure is expected to grow rapidly each year. However, the growing number of credit card transactions provides more opportunity for thieves to steal credit card numbers and subsequently commit fraud. When banks lose money because of credit card fraud, cardholders pay for all of that loss through higher interest rates, higher fees, and reduced benefits. Hence, it is in both the bank’s and cardholders’ interest to reduce illegitimate use of credit cards by early fraud detection.
  • 3. For many years, the credit card industry has studied computing models for automated detection system; recently, these models have been the subject of academic research, especially with respect to e-commerce. The credit card fraud-detection domain presents a number of challenging issues for data mining: • These are millions of credit card transactions processed each day. Mining such massive amounts of data requires highly efficient techniques that scale. • The data are highly skewed-many more transactions are legitimate than fraudulent. • Typical accuracy-based mining techniques can generate highly accurate fraud detectors by simply predicting that all transactions are legitimate, although this is equivalent to not detecting fraud at all. Each transaction record has a different dollar amount and thus has a variable potential loss, rather than a fixed misclassification cost per error type, as is commonly assumed in cost-based mining techniques. Our approach addresses the efficiency and scalability issues in several ways. We divide large data set of labeled transactions (either fraudulent or legitimate) into smaller subsets apply mining techniques to generate classifiers In parallel, and combine the resultant base models by metalearning from the classifiers behavior to generate a metaclassifier. Our approach treats the classifiers as black boxes so that we can employ a variety of learning algorithms. Besides extensibility, combining multiple models computed over all available data produces metaclassifiers that can offset the loss of predictive performance that usually occurs when mining from data subsets or sampling. Furthermore, when we use the learned classifiers (for example, during transaction authorization), the base classifiers can execute in parallel, with the metaclassifier then combining their results. So our approach is highly efficient in generating these models and also relatively efficient in applying them. Another parallel approach focuses on parallelizing a particular algorithm on a particular parallel architecture. However, a new algorithm or architecture requires a substantial amount of parallel-programming work. Although our architecture- and algorithm-independent approach is not as efficient as some fine-grained parallelization approaches, it lets users plug different off-the-shelf learning programs into a parallel and distributed environment with relation ease and eliminates the need for expensive parallel hardware. The proposed system uses a the data mining algorithms to determine the credit card fraud detection systems
  • 4. THE FOLLOWING ALGORITHMS ARE USED TO IMPLEMENT THE CREDIT CARD FRAUD DETECTION USING DATA MINING 1) Clustering – ‘K’ Means Clustering Algorithm 2) Classification- decision trees 3) Cost estimation- ADA cost algorithm PROPOSED SYSTEM HARDWARE REQUIREMENTS HARDWARE Processor - PIII or higher processor RAM - 128 MB or higher HDD - 40 GB or higher FDD - 1.44 MB MONITOR - LG/SAMSUNG colour Keyboard Mouse ATX Cabinet SOFTWARE OPERATING SYSTEM : WIN 2000/WIN XP/WIN 98 SOFTWARE : JDK 1.3 OR HIGHER DATABASE : Oracle 8i MODULES 1) KNOWLEDGE BASE CREATION 2) DATA ANALYSIS 3) DATA QUERY ANALYSER 4) CLUSTERING – ‘K’ MEANS CLUSTERING 5) CLASSIFICATION – DECISION TREES 6) COST ESTIMATION- ADA COST ALGORITHM 7) RESULT ANALYSER 8) GRAPHS