SlideShare a Scribd company logo
1 of 10
Download to read offline
The techniques of data mining
                                                   in support of Fraud Management
      How to design a Predictive Model, by Marco Scattareggia – HP EMEA Fraud Center of Excellence Manager
                                                    Information Security magazine published on May/June 2011




      >>   Data mining is a superposition of rapidly evolving disciplines, including statistics and
       artificial intelligence which are the two most emerging among many others. This article
          clarifies the meaning of the main technical terms which can make it more difficult to
     understand the methods of analysis, and in particular, those used for the prediction of the
           phenomena of interest and the construction of appropriate predictive models. Fraud
      management, as well as other industrial applications, relies on data mining techniques to
         perform fast decision making according to the scoring of the fraud risks. The concepts
    contained in this article come from the work done by the author, during the preparation of
       the workshop on data mining and fraud management, held in Rome in the auditorium of
       Telecom Italia on September 13, 2011, thanks to a worthy initiative of Stefano Maria de
                                                     'Rossi, to whom the author gives his thanks.



                                                                 About the author Marco Scattareggia




                                                                   Marco Scattareggia, graduated in Electronic
                                                          Engineering and Computer Science, works in Rome at
                                                            Hewlett-Packard Italia where he directs the Center
                                                            of Excellence for HP EMEA dedicated to the design
                                                           and implementation of fraud management solutions
                                                                                        for telecom operators.




                                                                                                                  
 
Inductive reasoning, data mining and        also needed. These preliminary activities
                      fraud management          are highlighted in the early stages of the
Data mining, or digging for gold in the         KDD (Knowledge Discovery in Databases)
large amounts of data, is the combination       paradigm, and are generally found in
of several disciplines including statistical    products known as ETL (Extract,
inference, the management of computer           Transform, Load). By visiting the site
databases and machine learning, which is        www.kdd.org, you can understand how
the study of self-learning in artificial        data mining can actually consist in the
intelligence research.                          analysis phase of the interactive process
                                                for the extraction of knowledge from the
Literally data mining refers to extracting      data shown in Figure 1.
knowledge from a mass of data in order to
acquire the rules that provide decision
support and determine what action should
be taken. Such a concept can effectively
be expressed with the term actionable
insight and the benefits to a business
process, like fraud management, can be
drawn from forecasting techniques. In data
mining, this predictive analytics activity is
based on three elements:

      1. Large amounts of available data to
                                                                  Figure 1
         be analyzed and to provide
         representative samples for training,   Besides being interested in the practical
         verification and validation of         applications of data mining in an industrial
         predictive models.                     context, it is also useful to examine Figure
      2. Analytical techniques for              2, which sets forth the evolution of the
         understanding the data, their          techniques of business analytics. It starts
         structures and their significance.     with the simple act of reporting, which
      3. Forecasting models to be               provides a graphical summary of data
         articulated, as in every computer      grouped according to their different
         process, in terms of input, process,   dimensions, and highlights the main
         and output. In different words, by     differences and the elements of interest.
         predictors (the input), algorithms     The second phase corresponds to the
         (the process) and target of the        activity of analysis to understand why
         forecast (the output).                 there was a specific phenomenon.
                                                Subsequently, the monitoring is the use of
In addition to the techniques of analysis,      tools that let you control what is
adequate tools and methods for data             happening and finally, predictive analytics
collection, normalization, and loading are      allows you to determine what could or




                                                                                                
 
should happen in the future. Obviously, it
should be pointed out that the future may        It is interesting to note that the techniques
be predicted only in probabilistic terms,        of business analytics are derived from
and nobody can be one hundred percent            inferential statistics and more specifically
sure about what really will happen.              from Bayesian probabilistic reasoning.
The result of this process is an ordering, a     Thomas Bayes' theorem on conditional
probabilistic ranking of the possible events     probability answers the question "Knowing
based on previously accumulated                  that there was the effect B, what is the
experience. This activity, known as              probability that A is the cause?" In a
scoring, assigns a value in percentage           nutshell, it gives the probability of a cause
terms, the score, which expresses the            when knowing its effect.
confidence we may have in the forecast           The article “How to build a predictive
itself. It allows us to perform our actions in   model” published on May/June 2011 by the
a consistent way according to the score          Italian Information Security magazine,
values. For example, in fraud                    explained how to calculate the buying
management, a high score corresponds to a        probability given the gender (man or
big risk of fraud and the consequential          woman), and by observing the customers’
action could be to stop the service (e.g.,       dressing style:
the loan from a financial bank, the                   - During the construction of the
telephone line, the insurance protection,                 model, the outcome or effect,
etc.), while a more moderate score may                    which in the example is the positive
only require an additional investigation by               or negative result of a purchase, is
the analyst.                                              known while the cause requires a
This article will show how the fraud                      probabilistic assessment and is the
management application, designed as a                     object of analysis. The roles are
business process, can benefit from data                   reversed: knowing the effect we
mining techniques and the practical use of                look for the cause.
predictive models.                                    - When forecasting, the roles of
                                                          cause and effect return to their
                                                          natural sequence: given the causes,
                                                          the model predicts the resulting
                                                          effect. The gender of a person and
                                                          his or her dressing are the
                                                          predictors, while the purchase
                                                          decision, whether positive or
                                                          negative, becomes the target to
                                                          predict.
                                                 The analysis phase, during which the roles
                                                 of cause and effect (i.e., the predictors
                                                 and the target) are reversed, is indicated
                                                 in the techniques of predictive analytics as
                  Figure 2




                                                                                                  
 
supervised training of the model.              priori we know little or nothing of how the
                                               outcome of the results is determined,
Figure 3 below shows the contingency table     Laplace derived the way to calculate the
with the exemplary values of the               probability that the next result is a
probabilities to be used in Bayes' theorem     success:
to calculate the probability of purchase for
a man or a woman. It's like saying that                     P = (s +1) / (n +2)
having analyzed the history of purchases
and having been able to calculate or           where "s" is the number of previously
estimate the probability of the causes         observed successes and "n" the total
(predictors) conditioned by a specific         number of known instances. Laplace went
effect (target), we can use a forecasting      on to use his rule of succession to calculate
model based on Bayes’ theorem to predict       the probability of the rising sun each new
the likelihood of a future purchase once       day, based on the fact that, to date, this
having the person's gender and his or her      event has never failed and, obviously, he
dressing style.                                was strongly criticized by his
                                               contemporaries for his irreverent
                                               extrapolation.
                                               The goal of inferential statistics is to
                                               provide methods that are used to learn
                                               from experience, that is to build models to
                                               move from a set of particular cases to the
                                               general case. However, Laplace’s rule of
                                               succession, as well as the whole system of
                                               Bayesian inductive reasoning, can lead to
                                               blatant errors.
                                               The pitfalls inherent in the reasoning about
                  Figure 3                     the probabilities are highlighted by the so-
                                               called paradoxes that pose questions
Bayes’ theorem of probability of causes is     whose correct answers are highly illogical.
widely used to predict which causes are        The philosopher Bertrand Russell, for
more likely to have produced the observed      example, pointed out that falling from the
event. However, it was Pierre-Simon            roof of a twenty floor building, when
Laplace to consolidate, in his Essai sur les   arriving at the first floor you may
probabilités philosophique (1814), the         incorrectly infer from the Laplace’s rule of
logical system that is the foundation of       succession that, because nothing bad
inductive reasoning, and now referred to       happened during the fall for 19 of 20
as Bayesian reasoning.                         floors, there is no danger in the last
The formula that follows is Laplace’s rule     twentieth part of the fall too. Russell
of succession. Assuming that the results of    concluded pragmatically that an inductive
a phenomenon have only two options,            reasoning can be accepted if it not only
"success" and "failure", and alleged that a




                                                                                                
 
leads to a high probability prediction, but     The forecasts provided by the inductive
is also reasonably credible.                    models and their practical use in business
                                                decisions are based upon this response of
                                                Russell.

                                                When selecting data samples for the
                                                training, testing and validation of a
                                                predictive model, you need to raise two
                                                fundamental questions:

                  Figure 4                             a) Are the rules that constitute the
                                                       algorithm of the model consistent
Another example often used to                          with the characteristics of the
demonstrate the limits of inductive logic              individual entities that make up the
procedure is the paradox of the black                  sample?
ravens developed by Carl Gustav Hempel.
By examining a million crows, one by one,              b) Are the sample data really
we note that they are all black. After each            representative of the whole
observation, therefore, the theory that all            population of entities to be
ravens are black became increasingly likely            inferred?
to be true, and consistent with the
inductive principle. But the assumption         The answers to these questions are derived
"the crows are all black", if isolated, is      respectively from the concepts of internal
logically equivalent to the assumption "all     validity and external validity of an
things that are not black are not crows."       inferential statistical analysis as shown in
This second point would be more likely          Figure 5. The internal validity measures
true even after the observation of a "red       how much the results of the analysis are
apple" – it would be observed, in fact,         corrected for the sample of entities that
something "not black" that "is not a crow."     have been studied, and it may be affected
Obviously, the observation of a red apple,      by a not-perfectly random sampling
if taken to make true the proposition that      procedure which becomes an element of
all crows are black, it is not consistent and   noise and disturbance (bias). A good
not reasonably credible. Bertrand Russell       internal validity is necessary but not
would argue that if the population of crows     sufficient and we should also check the
in the world totals a million plus one          external validity and the degree of
exemplars, then the inference "the crows        generalization that is acquired by the
are all black", after examining a million of    predictive model. When the model did not
black crow, could be considered reasonably      have enough general rules, we may likely
correct. But if you were to estimate the        have just recorded most of the data
existence of a hundred million crows, then      present in the sample used for training (we
the sample of only one million black crows      have overfitted the model), but not
would no longer be sufficient.                  effectively learned from the data (we




                                                                                                
 
didn’t extract the knowledge hidden            model. It improves the internal validity of
behind the data). In this situation, the       the training sample by dividing the mass of
model will not be able to successfully         available data into homogeneous subsets.
process new cases from other samples.          It may also discover new patterns of fraud
                                               and help you to generate new detection
                                               rules. Moreover, the identification of
                                               values very distant from the average,
                                               called outliers, leads directly to the
                                               identification of cases that have a high
                                               probability of fraud and therefore require
                                               more thorough investigation.

                                               The Dilemma of the fraud manager
                                               The desire of every organization that is
                  Figure 5                     aware of the loss of revenues due to fraud,
                                               is obviously to achieve zero losses.
The techniques of predictive analytics help    Unfortunately this is not possible either
you make decisions once the data have          due to the rapid reaction of the criminal
been classified and characterized as to a      organizations that profit from fraud and
certain phenomenon. Other techniques,          quickly find new attack patterns and
such as OLAP (On-Line Analytical               discover more weaknesses in the defense
Processing), help to make decisions too        systems, and because fighting fraud has a
because they allow you to see what             cost that grows in proportion to the level
happened. However, a predictive model          of defense put in place. Figure 6 shows
directly provides the prediction of a          graphically that, without enforcement
phenomenon, estimates its size, and allows     systems, the losses to fraud can reach very
you to perform the right actions.              high levels, to over 30% of total revenues,
A further possibility made available by        and may even threaten the very survival of
using the techniques of predictive analytics   the company. By putting in place an
is the separation and classification of the    appropriate organization to manage fraud,
elements belonging to a non-homogeneous        and provide an appropriate technology
set. The most common example for this          infrastructure, losses can be brought down
type of application is selecting which         to acceptable levels very quickly to the
customers to address in a marketing            order of a few percentage points.
campaign, who to send a business proposal      The competence of the fraud manager is
having a reasonable chance of getting a        important to identify the optimum
positive response, and rightly so, in these    compromise between the costs of
cases one can speak of business                managing fraud and the residual losses due
intelligence. This technique, known as         to fraud. This tradeoff is indicated by the
clustering, is also useful in the fraud        red point in Figure 6. Going further could
management area because it allows you to       significantly increase the cost of personnel
better target the action of a predictive




                                                                                               
 
and instruments to achieve only tiny
incremental loss reductions.




                                                                 Figure 7

                   Figure 6                    Wishing to reach the ideal point for which
The main difficulty, however, lies not on      you would have, at the same time, a
demonstrating the value of residual fraud      precision and a recall of 100%, one can
but estimating the losses actually             make several attempts to improve one or
prevented by the regular activities            the other KPI. For example, you could
performed by the fraud management team.        increase the number of cases of suspected
In other words it is not easy to estimate      fraud to be tested daily (increasing recall),
the loss size and the consequences             and, of course, increase the number of
theoretically due to frauds, which have not    working hours too. Conversely, one may
been perpetrated thanks to daily               attempt to better configure the FMS, and
prevention works.                              to reduce the number of cases to be
For more details and to understand how to      analyzed in a day by eliminating the false
calculate the ROI of an FMS, you can refer     alarms that needlessly consume the time of
to the article Return on Investment of an      the analysts (increasing precision).
FMS published on March/April 2011 by the       However, if you do not really increase the
Italian Information Security magazine.         information given to the system by adding
                                               new rules or some better searching
Technically you must choose the                keywords, when improving the precision
appropriate KPIs (Key Performance              you will get a worse percentage of recall,
Indicators) and measure both the value of      and vice versa.
fraud detected in a given period and of        The problem exposed leads to the dilemma
that remaining in the same period. For         that afflicts every fraud manager. In fact
example, the trends of two popular KPIs,       you cannot improve the results of fighting
known as precision (percentage of fraud        against fraud without simultaneously
detected in the analyzed total fraud) and      increasing the costs of structure (i.e., its
recall (percentage of fraud detected in the    power), or without increasing the
total of existing fraud) are shown in Figure   information to be provided to the FMS. It is
7.                                             therefore necessary meeting at least one
                                               of the two requirements, costs, or




                                                                                                
 
information, and possibly improve them          when both precision and recall KPIs are
both.                                           equal to 100% and therefore to a model
                                                that has matched the ideal point shown in
                                                Figure 7.

                                                For a comprehensive evaluation of a
                                                predictive model, the reader may refer to
                                                the article Evaluation of the predictive
                                                capabilities of an FMS published on
                                                February/March by the Italian Information
                                                Security magazine.

                                                Construction of a model to score
                                                cases of fraud in telecommunications
                  Figure 8                      Figure 9 shows the conceptual scheme of a
                                                predictive model to score cases of fraud in
The predictive models lend themselves to        a telecommunications company. In this
improve the effectiveness and efficiency of     representation the algorithm which forms
a fraud management department. For              the core of the model is represented by a
example, the inductive techniques of            neural network. However the whole model
decision trees can be used to extract new       would not change if you chose a different
rules from the data for better identifying      algorithm such as, for example, a decision
the cases of fraud, the scoring technique       tree, a Bayesian network, etc.
makes it easier to organize the human
resources on a risk priority basis and,
eventually, enabling automatic
mechanisms to be used at night or in the
absence of personnel. Figure 8 represents
the gain chart for three different scoring
models. This productivity gain consists of
the analyst’s time saving and it is in
contrast to a non-guided processing of the
cases that follows a random sequence
indicated by the red diagonal. The blue
solid line indicates the ideal path, which is                    Figure 9
practically unattainable, but is just the
aim. According to that, all cases of            The alarms and cases generated by the FMS
outright fraud, the true positives, are         are derived from aggregations, or other
immediately discovered without losing           information processing, of the elementary
time due to false alarms. It is interesting     data coming from the telecommunication
to note that this ideal situation occurs        traffic. In fact, all input data to a




                                                                                               
 
predictive model can be elaborated and         destined to become the lingua franca, i.e.
replaced with other derived parameters.        spoken by many vendors and systems, for
All input data and derived parameters          the standard definition and the immediate
compete in a sort of analytic game to be       use of predictive models. The PMML, which
elected as predictors, that are the right      is based on XML, provides all the methods
input to the core forecasting algorithm        and tools to define, verify and then put
which is highlighted in the blue box of        into practice the predictive models. By
Figure 9.                                      adopting PMML, it is no more necessarily
The output of the predictive model is          the case that the model is developed and
simply the score value associated with the     run by the same company as the vendor of
case. This value is a percentage and varies    the software products. All definitions and
between zero and one hundred, or               descriptions necessary for understanding
between zero and one, and expresses the        the PMML can be found on the DMG
probability that the case represents an        website, http://www.dmg.org/.
outright fraud (when the score is closer to
100), or a false alarm (when the score is      In conclusion, the PMML, being an open
close to 0).                                   standard, when combined with an offer of
                                               cloud computing, can dramatically lower
The inclusion of a predictive model in the     the TCO (Total Cost of Ownership) by
operational context of the company has a       breaking down the barriers of
significant impact on its existing structure   incompatibility between different systems
of information technology (IT) and it can      of the IT infrastructure already in place in
take many months to develop dedicated          the company. Furthermore, the inclusion
custom software and the associated             of the operational model in the context of
operating procedures. However, recently        applications can be run directly by the
the development of wide data transfer          same people who developed it, i.e.,
capacity through the Internet and the web      without involving the heavily technical IT
service technology, the emerging               department.
paradigms of cloud computing solutions,        For more on the creation of predictive
and SaaS - Software-as-a-Service, have         models, see the article How to design a
paved the way for an easier transition into    Predictive Model, published on May/June
production of the predictive models. The       2011 by the Italian Information Security
data mining community, represented by          magazine.
the Data Mining Group (DMG), has recently
developed a new language, PMML
(Predictive Model Markup Language) that is




                                                                                               
 

More Related Content

What's hot

1.[1 8]an affective decision making engine framework for practical software a...
1.[1 8]an affective decision making engine framework for practical software a...1.[1 8]an affective decision making engine framework for practical software a...
1.[1 8]an affective decision making engine framework for practical software a...Alexander Decker
 
Analytic network process
Analytic network processAnalytic network process
Analytic network processMat Sahudi
 
Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...butest
 
Means end analysis, knowledge in learning
Means end analysis,  knowledge in learningMeans end analysis,  knowledge in learning
Means end analysis, knowledge in learningGaurav Chaubey
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsMayank Johri
 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learningbusiness Corporate
 
Cybernetics in supply chain management
Cybernetics in supply chain managementCybernetics in supply chain management
Cybernetics in supply chain managementLuis Cabrera
 
Reducing False Positives
Reducing False PositivesReducing False Positives
Reducing False PositivesMayank Johri
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachErik De Monte
 
Instance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural NetworksInstance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural NetworksITIIIndustries
 
Semantic Web Based Sentiment Engine
Semantic Web Based Sentiment EngineSemantic Web Based Sentiment Engine
Semantic Web Based Sentiment EngineJames Dellinger
 
IRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET Journal
 

What's hot (15)

1.[1 8]an affective decision making engine framework for practical software a...
1.[1 8]an affective decision making engine framework for practical software a...1.[1 8]an affective decision making engine framework for practical software a...
1.[1 8]an affective decision making engine framework for practical software a...
 
Analytic network process
Analytic network processAnalytic network process
Analytic network process
 
Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...
 
Df24693697
Df24693697Df24693697
Df24693697
 
Means end analysis, knowledge in learning
Means end analysis,  knowledge in learningMeans end analysis,  knowledge in learning
Means end analysis, knowledge in learning
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule Thresholds
 
Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learning
 
IEEE 2 5 beta method unraveled
IEEE 2 5 beta method unraveledIEEE 2 5 beta method unraveled
IEEE 2 5 beta method unraveled
 
Cybernetics in supply chain management
Cybernetics in supply chain managementCybernetics in supply chain management
Cybernetics in supply chain management
 
Reducing False Positives
Reducing False PositivesReducing False Positives
Reducing False Positives
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 
Instance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural NetworksInstance Selection and Optimization of Neural Networks
Instance Selection and Optimization of Neural Networks
 
Semantic Web Based Sentiment Engine
Semantic Web Based Sentiment EngineSemantic Web Based Sentiment Engine
Semantic Web Based Sentiment Engine
 
IRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning Algorithms
 
Doc 20190909-wa0025
Doc 20190909-wa0025Doc 20190909-wa0025
Doc 20190909-wa0025
 

Similar to Data Mining In Support Of Fraud Management

Keys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleKeys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleGrant Thornton LLP
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlationVrushaliSolanke
 
Introduction of abm
Introduction of abmIntroduction of abm
Introduction of abmYudi Yasik
 
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...Thomas Lee
 
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperPrognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperImpetus Technologies
 
Proceedings of the 32nd Hawaii International Conference on Sys
Proceedings of the 32nd Hawaii International Conference on SysProceedings of the 32nd Hawaii International Conference on Sys
Proceedings of the 32nd Hawaii International Conference on SysDaliaCulbertson719
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKIRJET Journal
 
Risk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docxRisk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docxSUBHI7
 
SEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AISEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AIIRJET Journal
 
Emerging technologies enabling in fraud detection
Emerging technologies enabling in fraud detectionEmerging technologies enabling in fraud detection
Emerging technologies enabling in fraud detectionUmasree Raghunath
 
Data Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıData Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıMurat YAZICI, M.Sc.
 
2008 Issa Journal Security Metrics Hype Reality And Value Demonstration
2008 Issa Journal Security Metrics Hype Reality And Value Demonstration2008 Issa Journal Security Metrics Hype Reality And Value Demonstration
2008 Issa Journal Security Metrics Hype Reality And Value Demonstrationasundaram1
 
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...IRJET Journal
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches Venkat Projects
 
MB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptxMB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptxssuser28b150
 
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLMITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLijaia
 

Similar to Data Mining In Support Of Fraud Management (20)

Predictive analytics - The cure for business myopia
Predictive analytics - The cure for business myopiaPredictive analytics - The cure for business myopia
Predictive analytics - The cure for business myopia
 
Keys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleKeys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycle
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
 
Introduction of abm
Introduction of abmIntroduction of abm
Introduction of abm
 
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...
Advantages of Regression Models Over Expert Judgement for Characterizing Cybe...
 
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White PaperPrognosis - An Approach to Predictive Analytics- Impetus White Paper
Prognosis - An Approach to Predictive Analytics- Impetus White Paper
 
Proceedings of the 32nd Hawaii International Conference on Sys
Proceedings of the 32nd Hawaii International Conference on SysProceedings of the 32nd Hawaii International Conference on Sys
Proceedings of the 32nd Hawaii International Conference on Sys
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISKMACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
MACHINE LEARNING CLASSIFIERS TO ANALYZE CREDIT RISK
 
Risk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docxRisk management planExecutive SummaryThe past.docx
Risk management planExecutive SummaryThe past.docx
 
SEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AISEMANTIC NETWORKS IN AI
SEMANTIC NETWORKS IN AI
 
Emerging technologies enabling in fraud detection
Emerging technologies enabling in fraud detectionEmerging technologies enabling in fraud detection
Emerging technologies enabling in fraud detection
 
Data Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıData Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat Yazıcı
 
2008 Issa Journal Security Metrics Hype Reality And Value Demonstration
2008 Issa Journal Security Metrics Hype Reality And Value Demonstration2008 Issa Journal Security Metrics Hype Reality And Value Demonstration
2008 Issa Journal Security Metrics Hype Reality And Value Demonstration
 
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
 
modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches modeling and predicting cyber hacking breaches
modeling and predicting cyber hacking breaches
 
Thinking space
Thinking spaceThinking space
Thinking space
 
MB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptxMB2208A- Business Analytics- unit-4.pptx
MB2208A- Business Analytics- unit-4.pptx
 
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLMITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
 

Data Mining In Support Of Fraud Management

  • 1.
  • 2. The techniques of data mining in support of Fraud Management How to design a Predictive Model, by Marco Scattareggia – HP EMEA Fraud Center of Excellence Manager Information Security magazine published on May/June 2011 >> Data mining is a superposition of rapidly evolving disciplines, including statistics and artificial intelligence which are the two most emerging among many others. This article clarifies the meaning of the main technical terms which can make it more difficult to understand the methods of analysis, and in particular, those used for the prediction of the phenomena of interest and the construction of appropriate predictive models. Fraud management, as well as other industrial applications, relies on data mining techniques to perform fast decision making according to the scoring of the fraud risks. The concepts contained in this article come from the work done by the author, during the preparation of the workshop on data mining and fraud management, held in Rome in the auditorium of Telecom Italia on September 13, 2011, thanks to a worthy initiative of Stefano Maria de 'Rossi, to whom the author gives his thanks. About the author Marco Scattareggia Marco Scattareggia, graduated in Electronic Engineering and Computer Science, works in Rome at Hewlett-Packard Italia where he directs the Center of Excellence for HP EMEA dedicated to the design and implementation of fraud management solutions for telecom operators.    
  • 3. Inductive reasoning, data mining and also needed. These preliminary activities fraud management are highlighted in the early stages of the Data mining, or digging for gold in the KDD (Knowledge Discovery in Databases) large amounts of data, is the combination paradigm, and are generally found in of several disciplines including statistical products known as ETL (Extract, inference, the management of computer Transform, Load). By visiting the site databases and machine learning, which is www.kdd.org, you can understand how the study of self-learning in artificial data mining can actually consist in the intelligence research. analysis phase of the interactive process for the extraction of knowledge from the Literally data mining refers to extracting data shown in Figure 1. knowledge from a mass of data in order to acquire the rules that provide decision support and determine what action should be taken. Such a concept can effectively be expressed with the term actionable insight and the benefits to a business process, like fraud management, can be drawn from forecasting techniques. In data mining, this predictive analytics activity is based on three elements: 1. Large amounts of available data to Figure 1 be analyzed and to provide representative samples for training, Besides being interested in the practical verification and validation of applications of data mining in an industrial predictive models. context, it is also useful to examine Figure 2. Analytical techniques for 2, which sets forth the evolution of the understanding the data, their techniques of business analytics. It starts structures and their significance. with the simple act of reporting, which 3. Forecasting models to be provides a graphical summary of data articulated, as in every computer grouped according to their different process, in terms of input, process, dimensions, and highlights the main and output. In different words, by differences and the elements of interest. predictors (the input), algorithms The second phase corresponds to the (the process) and target of the activity of analysis to understand why forecast (the output). there was a specific phenomenon. Subsequently, the monitoring is the use of In addition to the techniques of analysis, tools that let you control what is adequate tools and methods for data happening and finally, predictive analytics collection, normalization, and loading are allows you to determine what could or    
  • 4. should happen in the future. Obviously, it should be pointed out that the future may It is interesting to note that the techniques be predicted only in probabilistic terms, of business analytics are derived from and nobody can be one hundred percent inferential statistics and more specifically sure about what really will happen. from Bayesian probabilistic reasoning. The result of this process is an ordering, a Thomas Bayes' theorem on conditional probabilistic ranking of the possible events probability answers the question "Knowing based on previously accumulated that there was the effect B, what is the experience. This activity, known as probability that A is the cause?" In a scoring, assigns a value in percentage nutshell, it gives the probability of a cause terms, the score, which expresses the when knowing its effect. confidence we may have in the forecast The article “How to build a predictive itself. It allows us to perform our actions in model” published on May/June 2011 by the a consistent way according to the score Italian Information Security magazine, values. For example, in fraud explained how to calculate the buying management, a high score corresponds to a probability given the gender (man or big risk of fraud and the consequential woman), and by observing the customers’ action could be to stop the service (e.g., dressing style: the loan from a financial bank, the - During the construction of the telephone line, the insurance protection, model, the outcome or effect, etc.), while a more moderate score may which in the example is the positive only require an additional investigation by or negative result of a purchase, is the analyst. known while the cause requires a This article will show how the fraud probabilistic assessment and is the management application, designed as a object of analysis. The roles are business process, can benefit from data reversed: knowing the effect we mining techniques and the practical use of look for the cause. predictive models. - When forecasting, the roles of cause and effect return to their natural sequence: given the causes, the model predicts the resulting effect. The gender of a person and his or her dressing are the predictors, while the purchase decision, whether positive or negative, becomes the target to predict. The analysis phase, during which the roles of cause and effect (i.e., the predictors and the target) are reversed, is indicated in the techniques of predictive analytics as Figure 2    
  • 5. supervised training of the model. priori we know little or nothing of how the outcome of the results is determined, Figure 3 below shows the contingency table Laplace derived the way to calculate the with the exemplary values of the probability that the next result is a probabilities to be used in Bayes' theorem success: to calculate the probability of purchase for a man or a woman. It's like saying that P = (s +1) / (n +2) having analyzed the history of purchases and having been able to calculate or where "s" is the number of previously estimate the probability of the causes observed successes and "n" the total (predictors) conditioned by a specific number of known instances. Laplace went effect (target), we can use a forecasting on to use his rule of succession to calculate model based on Bayes’ theorem to predict the probability of the rising sun each new the likelihood of a future purchase once day, based on the fact that, to date, this having the person's gender and his or her event has never failed and, obviously, he dressing style. was strongly criticized by his contemporaries for his irreverent extrapolation. The goal of inferential statistics is to provide methods that are used to learn from experience, that is to build models to move from a set of particular cases to the general case. However, Laplace’s rule of succession, as well as the whole system of Bayesian inductive reasoning, can lead to blatant errors. The pitfalls inherent in the reasoning about Figure 3 the probabilities are highlighted by the so- called paradoxes that pose questions Bayes’ theorem of probability of causes is whose correct answers are highly illogical. widely used to predict which causes are The philosopher Bertrand Russell, for more likely to have produced the observed example, pointed out that falling from the event. However, it was Pierre-Simon roof of a twenty floor building, when Laplace to consolidate, in his Essai sur les arriving at the first floor you may probabilités philosophique (1814), the incorrectly infer from the Laplace’s rule of logical system that is the foundation of succession that, because nothing bad inductive reasoning, and now referred to happened during the fall for 19 of 20 as Bayesian reasoning. floors, there is no danger in the last The formula that follows is Laplace’s rule twentieth part of the fall too. Russell of succession. Assuming that the results of concluded pragmatically that an inductive a phenomenon have only two options, reasoning can be accepted if it not only "success" and "failure", and alleged that a    
  • 6. leads to a high probability prediction, but The forecasts provided by the inductive is also reasonably credible. models and their practical use in business decisions are based upon this response of Russell. When selecting data samples for the training, testing and validation of a predictive model, you need to raise two fundamental questions: Figure 4 a) Are the rules that constitute the algorithm of the model consistent Another example often used to with the characteristics of the demonstrate the limits of inductive logic individual entities that make up the procedure is the paradox of the black sample? ravens developed by Carl Gustav Hempel. By examining a million crows, one by one, b) Are the sample data really we note that they are all black. After each representative of the whole observation, therefore, the theory that all population of entities to be ravens are black became increasingly likely inferred? to be true, and consistent with the inductive principle. But the assumption The answers to these questions are derived "the crows are all black", if isolated, is respectively from the concepts of internal logically equivalent to the assumption "all validity and external validity of an things that are not black are not crows." inferential statistical analysis as shown in This second point would be more likely Figure 5. The internal validity measures true even after the observation of a "red how much the results of the analysis are apple" – it would be observed, in fact, corrected for the sample of entities that something "not black" that "is not a crow." have been studied, and it may be affected Obviously, the observation of a red apple, by a not-perfectly random sampling if taken to make true the proposition that procedure which becomes an element of all crows are black, it is not consistent and noise and disturbance (bias). A good not reasonably credible. Bertrand Russell internal validity is necessary but not would argue that if the population of crows sufficient and we should also check the in the world totals a million plus one external validity and the degree of exemplars, then the inference "the crows generalization that is acquired by the are all black", after examining a million of predictive model. When the model did not black crow, could be considered reasonably have enough general rules, we may likely correct. But if you were to estimate the have just recorded most of the data existence of a hundred million crows, then present in the sample used for training (we the sample of only one million black crows have overfitted the model), but not would no longer be sufficient. effectively learned from the data (we    
  • 7. didn’t extract the knowledge hidden model. It improves the internal validity of behind the data). In this situation, the the training sample by dividing the mass of model will not be able to successfully available data into homogeneous subsets. process new cases from other samples. It may also discover new patterns of fraud and help you to generate new detection rules. Moreover, the identification of values very distant from the average, called outliers, leads directly to the identification of cases that have a high probability of fraud and therefore require more thorough investigation. The Dilemma of the fraud manager The desire of every organization that is Figure 5 aware of the loss of revenues due to fraud, is obviously to achieve zero losses. The techniques of predictive analytics help Unfortunately this is not possible either you make decisions once the data have due to the rapid reaction of the criminal been classified and characterized as to a organizations that profit from fraud and certain phenomenon. Other techniques, quickly find new attack patterns and such as OLAP (On-Line Analytical discover more weaknesses in the defense Processing), help to make decisions too systems, and because fighting fraud has a because they allow you to see what cost that grows in proportion to the level happened. However, a predictive model of defense put in place. Figure 6 shows directly provides the prediction of a graphically that, without enforcement phenomenon, estimates its size, and allows systems, the losses to fraud can reach very you to perform the right actions. high levels, to over 30% of total revenues, A further possibility made available by and may even threaten the very survival of using the techniques of predictive analytics the company. By putting in place an is the separation and classification of the appropriate organization to manage fraud, elements belonging to a non-homogeneous and provide an appropriate technology set. The most common example for this infrastructure, losses can be brought down type of application is selecting which to acceptable levels very quickly to the customers to address in a marketing order of a few percentage points. campaign, who to send a business proposal The competence of the fraud manager is having a reasonable chance of getting a important to identify the optimum positive response, and rightly so, in these compromise between the costs of cases one can speak of business managing fraud and the residual losses due intelligence. This technique, known as to fraud. This tradeoff is indicated by the clustering, is also useful in the fraud red point in Figure 6. Going further could management area because it allows you to significantly increase the cost of personnel better target the action of a predictive    
  • 8. and instruments to achieve only tiny incremental loss reductions. Figure 7 Figure 6 Wishing to reach the ideal point for which The main difficulty, however, lies not on you would have, at the same time, a demonstrating the value of residual fraud precision and a recall of 100%, one can but estimating the losses actually make several attempts to improve one or prevented by the regular activities the other KPI. For example, you could performed by the fraud management team. increase the number of cases of suspected In other words it is not easy to estimate fraud to be tested daily (increasing recall), the loss size and the consequences and, of course, increase the number of theoretically due to frauds, which have not working hours too. Conversely, one may been perpetrated thanks to daily attempt to better configure the FMS, and prevention works. to reduce the number of cases to be For more details and to understand how to analyzed in a day by eliminating the false calculate the ROI of an FMS, you can refer alarms that needlessly consume the time of to the article Return on Investment of an the analysts (increasing precision). FMS published on March/April 2011 by the However, if you do not really increase the Italian Information Security magazine. information given to the system by adding new rules or some better searching Technically you must choose the keywords, when improving the precision appropriate KPIs (Key Performance you will get a worse percentage of recall, Indicators) and measure both the value of and vice versa. fraud detected in a given period and of The problem exposed leads to the dilemma that remaining in the same period. For that afflicts every fraud manager. In fact example, the trends of two popular KPIs, you cannot improve the results of fighting known as precision (percentage of fraud against fraud without simultaneously detected in the analyzed total fraud) and increasing the costs of structure (i.e., its recall (percentage of fraud detected in the power), or without increasing the total of existing fraud) are shown in Figure information to be provided to the FMS. It is 7. therefore necessary meeting at least one of the two requirements, costs, or    
  • 9. information, and possibly improve them when both precision and recall KPIs are both. equal to 100% and therefore to a model that has matched the ideal point shown in Figure 7. For a comprehensive evaluation of a predictive model, the reader may refer to the article Evaluation of the predictive capabilities of an FMS published on February/March by the Italian Information Security magazine. Construction of a model to score cases of fraud in telecommunications Figure 8 Figure 9 shows the conceptual scheme of a predictive model to score cases of fraud in The predictive models lend themselves to a telecommunications company. In this improve the effectiveness and efficiency of representation the algorithm which forms a fraud management department. For the core of the model is represented by a example, the inductive techniques of neural network. However the whole model decision trees can be used to extract new would not change if you chose a different rules from the data for better identifying algorithm such as, for example, a decision the cases of fraud, the scoring technique tree, a Bayesian network, etc. makes it easier to organize the human resources on a risk priority basis and, eventually, enabling automatic mechanisms to be used at night or in the absence of personnel. Figure 8 represents the gain chart for three different scoring models. This productivity gain consists of the analyst’s time saving and it is in contrast to a non-guided processing of the cases that follows a random sequence indicated by the red diagonal. The blue solid line indicates the ideal path, which is Figure 9 practically unattainable, but is just the aim. According to that, all cases of The alarms and cases generated by the FMS outright fraud, the true positives, are are derived from aggregations, or other immediately discovered without losing information processing, of the elementary time due to false alarms. It is interesting data coming from the telecommunication to note that this ideal situation occurs traffic. In fact, all input data to a    
  • 10. predictive model can be elaborated and destined to become the lingua franca, i.e. replaced with other derived parameters. spoken by many vendors and systems, for All input data and derived parameters the standard definition and the immediate compete in a sort of analytic game to be use of predictive models. The PMML, which elected as predictors, that are the right is based on XML, provides all the methods input to the core forecasting algorithm and tools to define, verify and then put which is highlighted in the blue box of into practice the predictive models. By Figure 9. adopting PMML, it is no more necessarily The output of the predictive model is the case that the model is developed and simply the score value associated with the run by the same company as the vendor of case. This value is a percentage and varies the software products. All definitions and between zero and one hundred, or descriptions necessary for understanding between zero and one, and expresses the the PMML can be found on the DMG probability that the case represents an website, http://www.dmg.org/. outright fraud (when the score is closer to 100), or a false alarm (when the score is In conclusion, the PMML, being an open close to 0). standard, when combined with an offer of cloud computing, can dramatically lower The inclusion of a predictive model in the the TCO (Total Cost of Ownership) by operational context of the company has a breaking down the barriers of significant impact on its existing structure incompatibility between different systems of information technology (IT) and it can of the IT infrastructure already in place in take many months to develop dedicated the company. Furthermore, the inclusion custom software and the associated of the operational model in the context of operating procedures. However, recently applications can be run directly by the the development of wide data transfer same people who developed it, i.e., capacity through the Internet and the web without involving the heavily technical IT service technology, the emerging department. paradigms of cloud computing solutions, For more on the creation of predictive and SaaS - Software-as-a-Service, have models, see the article How to design a paved the way for an easier transition into Predictive Model, published on May/June production of the predictive models. The 2011 by the Italian Information Security data mining community, represented by magazine. the Data Mining Group (DMG), has recently developed a new language, PMML (Predictive Model Markup Language) that is