Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this document? Why not share!

1,780 views

Published on

No Downloads

Total views

1,780

On SlideShare

0

From Embeds

0

Number of Embeds

5

Shares

0

Downloads

43

Comments

0

Likes

1

No embeds

No notes for slide

- 1. The techniques of data mining in support of Fraud Management How to design a Predictive Model, by Marco Scattareggia – HP EMEA Fraud Center of Excellence Manager Information Security magazine published on May/June 2011 >> Data mining is a superposition of rapidly evolving disciplines, including statistics and artificial intelligence which are the two most emerging among many others. This article clarifies the meaning of the main technical terms which can make it more difficult to understand the methods of analysis, and in particular, those used for the prediction of the phenomena of interest and the construction of appropriate predictive models. Fraud management, as well as other industrial applications, relies on data mining techniques to perform fast decision making according to the scoring of the fraud risks. The concepts contained in this article come from the work done by the author, during the preparation of the workshop on data mining and fraud management, held in Rome in the auditorium of Telecom Italia on September 13, 2011, thanks to a worthy initiative of Stefano Maria de Rossi, to whom the author gives his thanks. About the author Marco Scattareggia Marco Scattareggia, graduated in Electronic Engineering and Computer Science, works in Rome at Hewlett-Packard Italia where he directs the Center of Excellence for HP EMEA dedicated to the design and implementation of fraud management solutions for telecom operators.
- 2. Inductive reasoning, data mining and also needed. These preliminary activities fraud management are highlighted in the early stages of theData mining, or digging for gold in the KDD (Knowledge Discovery in Databases)large amounts of data, is the combination paradigm, and are generally found inof several disciplines including statistical products known as ETL (Extract,inference, the management of computer Transform, Load). By visiting the sitedatabases and machine learning, which is www.kdd.org, you can understand howthe study of self-learning in artificial data mining can actually consist in theintelligence research. analysis phase of the interactive process for the extraction of knowledge from theLiterally data mining refers to extracting data shown in Figure 1.knowledge from a mass of data in order toacquire the rules that provide decisionsupport and determine what action shouldbe taken. Such a concept can effectivelybe expressed with the term actionableinsight and the benefits to a businessprocess, like fraud management, can bedrawn from forecasting techniques. In datamining, this predictive analytics activity isbased on three elements: 1. Large amounts of available data to Figure 1 be analyzed and to provide representative samples for training, Besides being interested in the practical verification and validation of applications of data mining in an industrial predictive models. context, it is also useful to examine Figure 2. Analytical techniques for 2, which sets forth the evolution of the understanding the data, their techniques of business analytics. It starts structures and their significance. with the simple act of reporting, which 3. Forecasting models to be provides a graphical summary of data articulated, as in every computer grouped according to their different process, in terms of input, process, dimensions, and highlights the main and output. In different words, by differences and the elements of interest. predictors (the input), algorithms The second phase corresponds to the (the process) and target of the activity of analysis to understand why forecast (the output). there was a specific phenomenon. Subsequently, the monitoring is the use ofIn addition to the techniques of analysis, tools that let you control what isadequate tools and methods for data happening and finally, predictive analyticscollection, normalization, and loading are allows you to determine what could or
- 3. should happen in the future. Obviously, itshould be pointed out that the future may It is interesting to note that the techniquesbe predicted only in probabilistic terms, of business analytics are derived fromand nobody can be one hundred percent inferential statistics and more specificallysure about what really will happen. from Bayesian probabilistic reasoning.The result of this process is an ordering, a Thomas Bayes theorem on conditionalprobabilistic ranking of the possible events probability answers the question "Knowingbased on previously accumulated that there was the effect B, what is theexperience. This activity, known as probability that A is the cause?" In ascoring, assigns a value in percentage nutshell, it gives the probability of a causeterms, the score, which expresses the when knowing its effect.confidence we may have in the forecast The article “How to build a predictiveitself. It allows us to perform our actions in model” published on May/June 2011 by thea consistent way according to the score Italian Information Security magazine,values. For example, in fraud explained how to calculate the buyingmanagement, a high score corresponds to a probability given the gender (man orbig risk of fraud and the consequential woman), and by observing the customers’action could be to stop the service (e.g., dressing style:the loan from a financial bank, the - During the construction of thetelephone line, the insurance protection, model, the outcome or effect,etc.), while a more moderate score may which in the example is the positiveonly require an additional investigation by or negative result of a purchase, isthe analyst. known while the cause requires aThis article will show how the fraud probabilistic assessment and is themanagement application, designed as a object of analysis. The roles arebusiness process, can benefit from data reversed: knowing the effect wemining techniques and the practical use of look for the cause.predictive models. - When forecasting, the roles of cause and effect return to their natural sequence: given the causes, the model predicts the resulting effect. The gender of a person and his or her dressing are the predictors, while the purchase decision, whether positive or negative, becomes the target to predict. The analysis phase, during which the roles of cause and effect (i.e., the predictors and the target) are reversed, is indicated in the techniques of predictive analytics as Figure 2
- 4. supervised training of the model. priori we know little or nothing of how the outcome of the results is determined,Figure 3 below shows the contingency table Laplace derived the way to calculate thewith the exemplary values of the probability that the next result is aprobabilities to be used in Bayes theorem success:to calculate the probability of purchase fora man or a woman. Its like saying that P = (s +1) / (n +2)having analyzed the history of purchasesand having been able to calculate or where "s" is the number of previouslyestimate the probability of the causes observed successes and "n" the total(predictors) conditioned by a specific number of known instances. Laplace wenteffect (target), we can use a forecasting on to use his rule of succession to calculatemodel based on Bayes’ theorem to predict the probability of the rising sun each newthe likelihood of a future purchase once day, based on the fact that, to date, thishaving the persons gender and his or her event has never failed and, obviously, hedressing style. was strongly criticized by his contemporaries for his irreverent extrapolation. The goal of inferential statistics is to provide methods that are used to learn from experience, that is to build models to move from a set of particular cases to the general case. However, Laplace’s rule of succession, as well as the whole system of Bayesian inductive reasoning, can lead to blatant errors. The pitfalls inherent in the reasoning about Figure 3 the probabilities are highlighted by the so- called paradoxes that pose questionsBayes’ theorem of probability of causes is whose correct answers are highly illogical.widely used to predict which causes are The philosopher Bertrand Russell, formore likely to have produced the observed example, pointed out that falling from theevent. However, it was Pierre-Simon roof of a twenty floor building, whenLaplace to consolidate, in his Essai sur les arriving at the first floor you mayprobabilités philosophique (1814), the incorrectly infer from the Laplace’s rule oflogical system that is the foundation of succession that, because nothing badinductive reasoning, and now referred to happened during the fall for 19 of 20as Bayesian reasoning. floors, there is no danger in the lastThe formula that follows is Laplace’s rule twentieth part of the fall too. Russellof succession. Assuming that the results of concluded pragmatically that an inductivea phenomenon have only two options, reasoning can be accepted if it not only"success" and "failure", and alleged that a
- 5. leads to a high probability prediction, but The forecasts provided by the inductiveis also reasonably credible. models and their practical use in business decisions are based upon this response of Russell. When selecting data samples for the training, testing and validation of a predictive model, you need to raise two fundamental questions: Figure 4 a) Are the rules that constitute the algorithm of the model consistentAnother example often used to with the characteristics of thedemonstrate the limits of inductive logic individual entities that make up theprocedure is the paradox of the black sample?ravens developed by Carl Gustav Hempel.By examining a million crows, one by one, b) Are the sample data reallywe note that they are all black. After each representative of the wholeobservation, therefore, the theory that all population of entities to beravens are black became increasingly likely inferred?to be true, and consistent with theinductive principle. But the assumption The answers to these questions are derived"the crows are all black", if isolated, is respectively from the concepts of internallogically equivalent to the assumption "all validity and external validity of anthings that are not black are not crows." inferential statistical analysis as shown inThis second point would be more likely Figure 5. The internal validity measurestrue even after the observation of a "red how much the results of the analysis areapple" – it would be observed, in fact, corrected for the sample of entities thatsomething "not black" that "is not a crow." have been studied, and it may be affectedObviously, the observation of a red apple, by a not-perfectly random samplingif taken to make true the proposition that procedure which becomes an element ofall crows are black, it is not consistent and noise and disturbance (bias). A goodnot reasonably credible. Bertrand Russell internal validity is necessary but notwould argue that if the population of crows sufficient and we should also check thein the world totals a million plus one external validity and the degree ofexemplars, then the inference "the crows generalization that is acquired by theare all black", after examining a million of predictive model. When the model did notblack crow, could be considered reasonably have enough general rules, we may likelycorrect. But if you were to estimate the have just recorded most of the dataexistence of a hundred million crows, then present in the sample used for training (wethe sample of only one million black crows have overfitted the model), but notwould no longer be sufficient. effectively learned from the data (we
- 6. didn’t extract the knowledge hidden model. It improves the internal validity ofbehind the data). In this situation, the the training sample by dividing the mass ofmodel will not be able to successfully available data into homogeneous subsets.process new cases from other samples. It may also discover new patterns of fraud and help you to generate new detection rules. Moreover, the identification of values very distant from the average, called outliers, leads directly to the identification of cases that have a high probability of fraud and therefore require more thorough investigation. The Dilemma of the fraud manager The desire of every organization that is Figure 5 aware of the loss of revenues due to fraud, is obviously to achieve zero losses.The techniques of predictive analytics help Unfortunately this is not possible eitheryou make decisions once the data have due to the rapid reaction of the criminalbeen classified and characterized as to a organizations that profit from fraud andcertain phenomenon. Other techniques, quickly find new attack patterns andsuch as OLAP (On-Line Analytical discover more weaknesses in the defenseProcessing), help to make decisions too systems, and because fighting fraud has abecause they allow you to see what cost that grows in proportion to the levelhappened. However, a predictive model of defense put in place. Figure 6 showsdirectly provides the prediction of a graphically that, without enforcementphenomenon, estimates its size, and allows systems, the losses to fraud can reach veryyou to perform the right actions. high levels, to over 30% of total revenues,A further possibility made available by and may even threaten the very survival ofusing the techniques of predictive analytics the company. By putting in place anis the separation and classification of the appropriate organization to manage fraud,elements belonging to a non-homogeneous and provide an appropriate technologyset. The most common example for this infrastructure, losses can be brought downtype of application is selecting which to acceptable levels very quickly to thecustomers to address in a marketing order of a few percentage points.campaign, who to send a business proposal The competence of the fraud manager ishaving a reasonable chance of getting a important to identify the optimumpositive response, and rightly so, in these compromise between the costs ofcases one can speak of business managing fraud and the residual losses dueintelligence. This technique, known as to fraud. This tradeoff is indicated by theclustering, is also useful in the fraud red point in Figure 6. Going further couldmanagement area because it allows you to significantly increase the cost of personnelbetter target the action of a predictive
- 7. and instruments to achieve only tinyincremental loss reductions. Figure 7 Figure 6 Wishing to reach the ideal point for whichThe main difficulty, however, lies not on you would have, at the same time, ademonstrating the value of residual fraud precision and a recall of 100%, one canbut estimating the losses actually make several attempts to improve one orprevented by the regular activities the other KPI. For example, you couldperformed by the fraud management team. increase the number of cases of suspectedIn other words it is not easy to estimate fraud to be tested daily (increasing recall),the loss size and the consequences and, of course, increase the number oftheoretically due to frauds, which have not working hours too. Conversely, one maybeen perpetrated thanks to daily attempt to better configure the FMS, andprevention works. to reduce the number of cases to beFor more details and to understand how to analyzed in a day by eliminating the falsecalculate the ROI of an FMS, you can refer alarms that needlessly consume the time ofto the article Return on Investment of an the analysts (increasing precision).FMS published on March/April 2011 by the However, if you do not really increase theItalian Information Security magazine. information given to the system by adding new rules or some better searchingTechnically you must choose the keywords, when improving the precisionappropriate KPIs (Key Performance you will get a worse percentage of recall,Indicators) and measure both the value of and vice versa.fraud detected in a given period and of The problem exposed leads to the dilemmathat remaining in the same period. For that afflicts every fraud manager. In factexample, the trends of two popular KPIs, you cannot improve the results of fightingknown as precision (percentage of fraud against fraud without simultaneouslydetected in the analyzed total fraud) and increasing the costs of structure (i.e., itsrecall (percentage of fraud detected in the power), or without increasing thetotal of existing fraud) are shown in Figure information to be provided to the FMS. It is7. therefore necessary meeting at least one of the two requirements, costs, or
- 8. information, and possibly improve them when both precision and recall KPIs areboth. equal to 100% and therefore to a model that has matched the ideal point shown in Figure 7. For a comprehensive evaluation of a predictive model, the reader may refer to the article Evaluation of the predictive capabilities of an FMS published on February/March by the Italian Information Security magazine. Construction of a model to score cases of fraud in telecommunications Figure 8 Figure 9 shows the conceptual scheme of a predictive model to score cases of fraud inThe predictive models lend themselves to a telecommunications company. In thisimprove the effectiveness and efficiency of representation the algorithm which formsa fraud management department. For the core of the model is represented by aexample, the inductive techniques of neural network. However the whole modeldecision trees can be used to extract new would not change if you chose a differentrules from the data for better identifying algorithm such as, for example, a decisionthe cases of fraud, the scoring technique tree, a Bayesian network, etc.makes it easier to organize the humanresources on a risk priority basis and,eventually, enabling automaticmechanisms to be used at night or in theabsence of personnel. Figure 8 representsthe gain chart for three different scoringmodels. This productivity gain consists ofthe analyst’s time saving and it is incontrast to a non-guided processing of thecases that follows a random sequenceindicated by the red diagonal. The bluesolid line indicates the ideal path, which is Figure 9practically unattainable, but is just theaim. According to that, all cases of The alarms and cases generated by the FMSoutright fraud, the true positives, are are derived from aggregations, or otherimmediately discovered without losing information processing, of the elementarytime due to false alarms. It is interesting data coming from the telecommunicationto note that this ideal situation occurs traffic. In fact, all input data to a
- 9. predictive model can be elaborated and destined to become the lingua franca, i.e.replaced with other derived parameters. spoken by many vendors and systems, forAll input data and derived parameters the standard definition and the immediatecompete in a sort of analytic game to be use of predictive models. The PMML, whichelected as predictors, that are the right is based on XML, provides all the methodsinput to the core forecasting algorithm and tools to define, verify and then putwhich is highlighted in the blue box of into practice the predictive models. ByFigure 9. adopting PMML, it is no more necessarilyThe output of the predictive model is the case that the model is developed andsimply the score value associated with the run by the same company as the vendor ofcase. This value is a percentage and varies the software products. All definitions andbetween zero and one hundred, or descriptions necessary for understandingbetween zero and one, and expresses the the PMML can be found on the DMGprobability that the case represents an website, http://www.dmg.org/.outright fraud (when the score is closer to100), or a false alarm (when the score is In conclusion, the PMML, being an openclose to 0). standard, when combined with an offer of cloud computing, can dramatically lowerThe inclusion of a predictive model in the the TCO (Total Cost of Ownership) byoperational context of the company has a breaking down the barriers ofsignificant impact on its existing structure incompatibility between different systemsof information technology (IT) and it can of the IT infrastructure already in place intake many months to develop dedicated the company. Furthermore, the inclusioncustom software and the associated of the operational model in the context ofoperating procedures. However, recently applications can be run directly by thethe development of wide data transfer same people who developed it, i.e.,capacity through the Internet and the web without involving the heavily technical ITservice technology, the emerging department.paradigms of cloud computing solutions, For more on the creation of predictiveand SaaS - Software-as-a-Service, have models, see the article How to design apaved the way for an easier transition into Predictive Model, published on May/Juneproduction of the predictive models. The 2011 by the Italian Information Securitydata mining community, represented by magazine.the Data Mining Group (DMG), has recentlydeveloped a new language, PMML(Predictive Model Markup Language) that is

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment