This document discusses methods for credit card fraud detection in the presence of concept drift and delayed labeling. It formalizes the problem and proposes learning strategies to handle feedback from investigators and delayed labels separately. Experiments on real and artificial datasets show that aggregating classifiers trained on feedback and delayed samples outperforms approaches that do not distinguish between sample types or do not aggregate classifiers. Future work will focus on adaptive aggregation and addressing sample selection bias.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
According to the Nilson report, the global Credit card and debit card fraud resulted in losses amounting to $24.71 billion in 2016 and 72% were bored by the Card issuers. Therefore, the card issue companies are eager to predict the fraud in real time and in advance to reduce their loss and protect their revenue. The goal of the project is to provide fraud analytics for credit card issue companies to predict fraud in real-time and in advance. By building a supervised fraud prediction model, we are aiming to capture the maximum number of real frauds while limiting the occurrence of mis-flagged frauds, in order to achieve a win-win situation both maximize our ROI and achieve customer satisfaction.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal,
According to the Nilson report, the global Credit card and debit card fraud resulted in losses amounting to $24.71 billion in 2016 and 72% were bored by the Card issuers. Therefore, the card issue companies are eager to predict the fraud in real time and in advance to reduce their loss and protect their revenue. The goal of the project is to provide fraud analytics for credit card issue companies to predict fraud in real-time and in advance. By building a supervised fraud prediction model, we are aiming to capture the maximum number of real frauds while limiting the occurrence of mis-flagged frauds, in order to achieve a win-win situation both maximize our ROI and achieve customer satisfaction.
Customer churn predictive modeling deals with predicting the probability of a customer defecting using historical, behavioral and socio-economical information. This tool is of great benefit to subscription based companies allowing them to maximize the results of retention campaigns. The problem of churn predictive modeling has been widely studied by the data mining and machine learning communities. It is usually tackled by using classification algorithms in order to learn the different patterns of both the churners and non-churners. Nevertheless, current state-of-the-art classification algorithms are not well aligned with commercial goals, in the sense that, the models miss to include the real financial costs and benefits during the training and evaluation phases. In the case of churn, evaluating a model based on a traditional measure such as accuracy or predictive power, does not yield to the best results when measured by the actual financial cost, ie. investment per subscriber on a loyalty campaign and the financial impact of failing to detect a real churner versus wrongly predicting a non-churner as a churner. In this presentacion, we present a new cost-sensitive framework for customer churn predictive modeling. First we propose a new financial based measure for evaluating the effectiveness of a churn campaign taking into account the available portfolio of offers, their individual financial cost and probability of offer acceptance depending on the customer profile. Then, using a real-world churn dataset we compare different cost-insensitive and cost-sensitive classification algorithms and measure their effectiveness based on their predictive power and also the cost optimization. The results show that using a cost-sensitive approach yields to an increase in cost savings of up to 26.4 %.
Slides from my PhD defense
Example-Dependent Cost-Sensitive Classification
Applications in Financial Risk Modeling and Marketing Analytics
https://github.com/albahnsen/phd-thesis
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELScscpconf
This paper uses the concept of possibilistic risk aversion to propose a new approach for portfolio selection in fuzzy environment. Using possibility theory, the possibilistic mean,
variance, standard deviation and risk premium of a fuzzy number are established. Possibilistic Sharpe ratio is defined as the ratio of possibilistic risk premium and possibilistic standard deviation of a portfolio. The Sharpe ratio is a measure of the performance of the portfolio compared to the risk taken. The higher the Sharpe ratio, the better the performance of the portfolio is and the greater the profits of taking risk. New models of fuzzy portfolio selection
considering the possibilistic Sharpe ratio, return and skewness of the portfolio are considered. The feasibility and effectiveness of the proposed method is illustrated by numerical example extracted from Bombay Stock Exchange (BSE), India and is solved by multiple objective genetic
algorithm (MOGA).
Presentation at SAS Analytics conference 2014
Predictive analytics has been applied to solve a wide range of real-world problems. Nevertheless, current state-of-the-art predictive analytics models are not well aligned with business needs since they don't include the real financial costs and benefits during the training and evaluation phases. Churn modeling does not yield the best results when it's measured by investment per subscriber on a loyalty campaign and the financial impact of failing to detect a churner versus wrongly predicting a non-churner. This presentation will show how using a cost-sensitive modeling approach leads to better results in terms of profitability and predictive power – and is applicable to many other business challenges.
Description of logistic regression and methods of classification, such as ROC, Precision Recall, Lift and issues related to Logistic regression estimation. Slides prepared for 2019 teaching.
Customer churn has been evolving as one of the major problems for financial organizations. The incessant competitions in the market and high cost of acquiring new customers have made organizations to drive their focus towards more effective customer retention strategies.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
Consideration on Fairness-aware Data Mining
IEEE International Workshop on Discrimination and Privacy-Aware Data Mining (DPADM 2012)
Dec. 10, 2012 @ Brussels, Belgium, in conjunction with ICDM2012
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.101
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-icdm-print.pdf
Handnote: http://www.kamishima.net/archive/2012-ws-icdm-HN.pdf
Workshop Homepage: https://sites.google.com/site/dpadm2012/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair regarding sensitive features such as race, gender, religion, and so on. Several researchers have recently begun to develop fairness-aware or discrimination-aware data mining techniques that take into account issues of social fairness, discrimination, and neutrality. In this paper, after demonstrating the applications of these techniques, we explore the formal concepts of fairness and techniques for handling fairness in data mining. We then provide an integrated view of these concepts based on statistical independence. Finally, we discuss the relations between fairness-aware data mining and other research topics, such as privacy-preserving data mining or causal inference.
Fairness-aware Learning through Regularization Approach
The 3rd IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, in conjunction with ICDM2011
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.83
Article @ Personal Site: http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
Handnote: http://www.kamishima.net/archive/2011-ws-icdm_padm-HN.pdf
Workshop Homepage: http://www.zurich.ibm.com/padm2011/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect people's lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be socially and legally fair from a viewpoint of social responsibility; namely, it must be unbiased and nondiscriminatory in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. From a privacy-preserving viewpoint, this can be interpreted as hiding sensitive information when classification results are observed. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Decision making is a human process: inasmuch as they are made under conditions of uncertainty, decisions require
human judgment. Sometimes, that judgment can be based upon our “gut feeling” which ideally arises on the basis
of learning from past experience. For most decisions that are simple, this “gut feeling” is adequate. However, with
increasing uncertainty and/or a growing number of independent variables, decisions become more complex and our
intuitive judgments become less reliable. At that point, we require reliable methods and tools to help us make wiser
choices between alternate courses of action.
The first part of this paper provides an overview of a number of significant, quantitative methods that are available
to us in the process of decision making. The second part challenges the objective validity of these methods, by
showing how taken for granted values and beliefs must be factored into this process. In conclusion, we provide a
few, central guidelines for more informed decision making practices that may incorporate quantitative methods,
while carefully accounting for their limitations.
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Toshihiro Kamishima
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects
Invited Talk @ Workshop on Fairness, Accountability, and Transparency in Machine Learning
In conjunction with the ICML 2015 @ Lille, France, Jul. 11, 2015
Web Site: http://www.kamishima.net/fadm/
Handnote: http://www.kamishima.net/archive/2015-ws-icml-HN.pdf
The goal of fairness-aware data mining (FADM) is to analyze data while taking into account potential issues of fairness. In this talk, we will cover three topics in FADM:
1. Fairness in a Recommendation Context: In classification tasks, the term "fairness" is regarded as anti-discrimination. We will present other types of problems related to the fairness in a recommendation context.
2. What is Fairness: Most formal definitions of fairness have a connection with the notion of statistical independence. We will explore other types of formal fairness based on causality, agreement, and unfairness.
3. Theoretical Problems of FADM: After reviewing technical and theoretical open problems in the FADM literature, we will introduce the theory of the generalization bound in terms of accuracy as well as fairness.
Joint work with Jun Sakuma, Shotaro Akaho, and Hideki Asoh
Customer churn predictive modeling deals with predicting the probability of a customer defecting using historical, behavioral and socio-economical information. This tool is of great benefit to subscription based companies allowing them to maximize the results of retention campaigns. The problem of churn predictive modeling has been widely studied by the data mining and machine learning communities. It is usually tackled by using classification algorithms in order to learn the different patterns of both the churners and non-churners. Nevertheless, current state-of-the-art classification algorithms are not well aligned with commercial goals, in the sense that, the models miss to include the real financial costs and benefits during the training and evaluation phases. In the case of churn, evaluating a model based on a traditional measure such as accuracy or predictive power, does not yield to the best results when measured by the actual financial cost, ie. investment per subscriber on a loyalty campaign and the financial impact of failing to detect a real churner versus wrongly predicting a non-churner as a churner. In this presentacion, we present a new cost-sensitive framework for customer churn predictive modeling. First we propose a new financial based measure for evaluating the effectiveness of a churn campaign taking into account the available portfolio of offers, their individual financial cost and probability of offer acceptance depending on the customer profile. Then, using a real-world churn dataset we compare different cost-insensitive and cost-sensitive classification algorithms and measure their effectiveness based on their predictive power and also the cost optimization. The results show that using a cost-sensitive approach yields to an increase in cost savings of up to 26.4 %.
Slides from my PhD defense
Example-Dependent Cost-Sensitive Classification
Applications in Financial Risk Modeling and Marketing Analytics
https://github.com/albahnsen/phd-thesis
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELScscpconf
This paper uses the concept of possibilistic risk aversion to propose a new approach for portfolio selection in fuzzy environment. Using possibility theory, the possibilistic mean,
variance, standard deviation and risk premium of a fuzzy number are established. Possibilistic Sharpe ratio is defined as the ratio of possibilistic risk premium and possibilistic standard deviation of a portfolio. The Sharpe ratio is a measure of the performance of the portfolio compared to the risk taken. The higher the Sharpe ratio, the better the performance of the portfolio is and the greater the profits of taking risk. New models of fuzzy portfolio selection
considering the possibilistic Sharpe ratio, return and skewness of the portfolio are considered. The feasibility and effectiveness of the proposed method is illustrated by numerical example extracted from Bombay Stock Exchange (BSE), India and is solved by multiple objective genetic
algorithm (MOGA).
Presentation at SAS Analytics conference 2014
Predictive analytics has been applied to solve a wide range of real-world problems. Nevertheless, current state-of-the-art predictive analytics models are not well aligned with business needs since they don't include the real financial costs and benefits during the training and evaluation phases. Churn modeling does not yield the best results when it's measured by investment per subscriber on a loyalty campaign and the financial impact of failing to detect a churner versus wrongly predicting a non-churner. This presentation will show how using a cost-sensitive modeling approach leads to better results in terms of profitability and predictive power – and is applicable to many other business challenges.
Description of logistic regression and methods of classification, such as ROC, Precision Recall, Lift and issues related to Logistic regression estimation. Slides prepared for 2019 teaching.
Customer churn has been evolving as one of the major problems for financial organizations. The incessant competitions in the market and high cost of acquiring new customers have made organizations to drive their focus towards more effective customer retention strategies.
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
Consideration on Fairness-aware Data Mining
IEEE International Workshop on Discrimination and Privacy-Aware Data Mining (DPADM 2012)
Dec. 10, 2012 @ Brussels, Belgium, in conjunction with ICDM2012
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.101
Article @ Personal Site: http://www.kamishima.net/archive/2012-ws-icdm-print.pdf
Handnote: http://www.kamishima.net/archive/2012-ws-icdm-HN.pdf
Workshop Homepage: https://sites.google.com/site/dpadm2012/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals' lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair regarding sensitive features such as race, gender, religion, and so on. Several researchers have recently begun to develop fairness-aware or discrimination-aware data mining techniques that take into account issues of social fairness, discrimination, and neutrality. In this paper, after demonstrating the applications of these techniques, we explore the formal concepts of fairness and techniques for handling fairness in data mining. We then provide an integrated view of these concepts based on statistical independence. Finally, we discuss the relations between fairness-aware data mining and other research topics, such as privacy-preserving data mining or causal inference.
Fairness-aware Learning through Regularization Approach
The 3rd IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2011)
Dec. 11, 2011 @ Vancouver, Canada, in conjunction with ICDM2011
Article @ Official Site: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.83
Article @ Personal Site: http://www.kamishima.net/archive/2011-ws-icdm_padm.pdf
Handnote: http://www.kamishima.net/archive/2011-ws-icdm_padm-HN.pdf
Workshop Homepage: http://www.zurich.ibm.com/padm2011/
Abstract:
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect people's lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be socially and legally fair from a viewpoint of social responsibility; namely, it must be unbiased and nondiscriminatory in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. From a privacy-preserving viewpoint, this can be interpreted as hiding sensitive information when classification results are observed. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Decision making is a human process: inasmuch as they are made under conditions of uncertainty, decisions require
human judgment. Sometimes, that judgment can be based upon our “gut feeling” which ideally arises on the basis
of learning from past experience. For most decisions that are simple, this “gut feeling” is adequate. However, with
increasing uncertainty and/or a growing number of independent variables, decisions become more complex and our
intuitive judgments become less reliable. At that point, we require reliable methods and tools to help us make wiser
choices between alternate courses of action.
The first part of this paper provides an overview of a number of significant, quantitative methods that are available
to us in the process of decision making. The second part challenges the objective validity of these methods, by
showing how taken for granted values and beliefs must be factored into this process. In conclusion, we provide a
few, central guidelines for more informed decision making practices that may incorporate quantitative methods,
while carefully accounting for their limitations.
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Toshihiro Kamishima
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects
Invited Talk @ Workshop on Fairness, Accountability, and Transparency in Machine Learning
In conjunction with the ICML 2015 @ Lille, France, Jul. 11, 2015
Web Site: http://www.kamishima.net/fadm/
Handnote: http://www.kamishima.net/archive/2015-ws-icml-HN.pdf
The goal of fairness-aware data mining (FADM) is to analyze data while taking into account potential issues of fairness. In this talk, we will cover three topics in FADM:
1. Fairness in a Recommendation Context: In classification tasks, the term "fairness" is regarded as anti-discrimination. We will present other types of problems related to the fairness in a recommendation context.
2. What is Fairness: Most formal definitions of fairness have a connection with the notion of statistical independence. We will explore other types of formal fairness based on causality, agreement, and unfairness.
3. Theoretical Problems of FADM: After reviewing technical and theoretical open problems in the FADM literature, we will introduce the theory of the generalization bound in terms of accuracy as well as fairness.
Joint work with Jun Sakuma, Shotaro Akaho, and Hideki Asoh
The Second Annual National-level Mathematics Talent Program for High School Students
Objective: To search and mentor young Mathematical talent (age group 13-15) in the country. Guide them to hone their skills and thinking by giving them the right tools.
The unique and the best aspect of this program is that it works independent of any grading systems and hence the students will be encouraged to take intellectual risks. The program stresses on questioning and looking at the proofs of various mathematical concepts so as to understand the thought process behind its origin and encourage the students to develop a research attitude. The program also tries to knit together various topics of algebra and geometry, and encourage students to think about its applications in the real-life scenarios.
Customer Segmentation with R - Deep Dive into flexclustJim Porzak
Jim Porzak's presentation at useR! 2015 in Aalborg, Denmark. Learn on how to segment customers based on stated interest surveys using the flexclust package in R. Covers basic customer segmentation concepts, introduction to flexclust, and solutions to three practical issues: the numbering problem, the stability problem, and the best choice for k.
Clustering Financial Time Series using their Correlations and their Distribut...Gautier Marti
We have designed a distance that takes into account both the correlation between the time series and also the distribution of the individual time series. A tutorial with Python code is available: https://www.datagrapple.com/Tech/GNPR-tutorial-How-to-cluster-random-walks.html
This talk was given at the Paris Machine Learning Meetup.
Global credit risk cycles, lending standards, and limits to cross border risk...SYRTO Project
Global credit risk cycles, lending standards, and limits to cross border risk diversification. Bernd Schwaab, Siem Jan Koopman, André Lucas.
SYRTO Code Workshop
Workshop on Systemic Risk Policy Issues for SYRTO (Bundesbank-ECB-ESRB)
Head Office of Deustche Bundesbank, Guest House
Frankfurt am Main - July, 2 2014
Advanced Strategies and Analytics for Campus Green Revolving FundsJoe Indvik
NOTE: Audio and visual may be out of sync if you use a browser other than Google Chrome. Download Chrome for optimum viewing.
This webinar provides tools and tips for using data, analytics, and modeling to inform the design and management of a green revolving fund.
The presentation is based on Green Revolving Funds: A Guide to Implementation & Management, a co-publication of AASHE and the Sustainable Endowments Institute released in August 2013.
JPX Working Paper Vol.9 【Summary】
Impacts of Speedup of Market System on Price Formations using Artificial Market Simulations
http://www.jpx.co.jp/corporate/news-releases/0010/20150331-02.html
Paper presented at the 6th International Work-Conference on Ambient Assisted Living.
Abstract: Due to the increasing demand of multi-camera setup and long-term monitoring in vision applications, real-time multi-view action recognition has gain a great interest in recent years. In this paper, we propose a multiple kernel learning based fusion framework that employs a motion-based person detector for finding regions of interest and local descriptors with bag-of-words quantisation for feature representation. The experimental results on a multi-view action dataset suggest that the proposed framework significantly outperforms simple fusion techniques and state-of-the-art methods.
Similar to Credit card fraud detection and concept drift adaptation with delayed supervised information (20)
Presentation of the Doctiris project called "Adaptive real-time machine learning for credit card fraud detection" funded by Innoviris (Brussels Region).
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Credit card fraud detection and concept drift adaptation with delayed supervised information
1. Introduction Problem formulation Learning strategy Experiments Conclusion
Credit Card Fraud Detection and
Concept-Drift Adaptation with Delayed
Supervised Information
Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen,
Cesare Alippi, and Gianluca Bontempi
15/07/2015
IEEE IJCNN 2015 conference
1/ 22
2. Introduction Problem formulation Learning strategy Experiments Conclusion
INTRODUCTION
Fraud Detection is notably a challenging problem because of
concept drift (i.e. customers’ habits evolve)
class unbalance (i.e. genuine transactions far outnumber
frauds)
uncertain class labels (i.e. some frauds are not reported or
reported with large delay and few transactions can be
timely investigated)
2/ 22
3. Introduction Problem formulation Learning strategy Experiments Conclusion
INTRODUCTION II
Fraud-detection systems (FDSs) differ from a classification
tasks:
only a small set of supervised samples is provided by
human investigators (they check few alerts).
the labels of the majority of transactions are available only
several days later (after customers have report
unauthorized transactions).
3/ 22
4. Introduction Problem formulation Learning strategy Experiments Conclusion
PROBLEM FORMULATION
We formalise FD as a classification problem:
At day t, classifier Kt−1 (trained on t − 1) associates to each
feature vector x ∈ Rn, a score PKt−1
(+|x).
The k transactions with largest PKt−1
(+|x) define the alerts
At reported to the investigators.
Investigators provide feedbacks Ft about the alerts in At,
defining a set of k supervised couples (x, y)
Ft = {(x, y), x ∈ At}, (1)
Ft are the only immediate supervised samples.
4/ 22
5. Introduction Problem formulation Learning strategy Experiments Conclusion
PROBLEM FORMULATION II
At day t, delayed supervised couples Dt−δ are transactions
that have not been checked by investigators, but their label
is assumed to be correct after that δ days have elapsed.
Time%
Feedbacks%
Supervised%samples%
Delayed%samples%
t −δ t −1 t
FtDt−δ
All%fraudulent%transac9ons%of%a%day%
All%genuine%transac9ons%of%a%day%
Fraudulent%transac9ons%in%the%feedback%
Genuine%transac9ons%in%the%feedback%
Figure : The supervised samples available at day t include: i)
feedbacks of the first δ days and ii) delayed couples occurred before
the δth
day.
5/ 22
6. Introduction Problem formulation Learning strategy Experiments Conclusion
Ft are a small set of risky transactions according the FDS.
Dt−δ contains all the occurred transactions in a day (≈ 99%
genuine transactions).
Time%
Fraudulent%transac9ons%in%
Genuine%transac9ons%in%
Fraudulent%feedback%in%%
Genuine%feedback%in%%
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
Day'1'
Day'2'
Day'3'
Ft
Ft
St
St
Dt−9
Figure : Everyday we have a new set of feedbacks
(Ft, Ft−1, . . . , Ft−(δ−1)) from the first δ days and a new set of delayed
transactions occurred on the δth
day (Dt−δ). In this Figure we assume
δ = 7.
6/ 22
7. Introduction Problem formulation Learning strategy Experiments Conclusion
ACCURACY MEASURE FOR A FDS
The goal of a FDS is to return accurate alerts, thus the highest
precision in At. This precision can be measured by the quantity
pk(t) =
#{(x, y) ∈ Ft s.t. y = +}
k
(2)
where pk(t) is the proportion of frauds in the top k transactions
with the highest likelihood of frauds ([1]).
7/ 22
8. Introduction Problem formulation Learning strategy Experiments Conclusion
LEARNING STRATEGY
Learning from feedbacks Ft is a different problem than learning
from delayed samples in Dt−δ:
Ft provides recent, up-to-date, information while Dt−δ
might be already obsolete once it comes.
Percentage of frauds in Ft and Dt−δ is different.
Supervised couples in Ft are not independently drawn, but
are instead selected by Kt−1.
A classifier trained on Ft learns how to label transactions
that are most likely to be fraudulent.
Feedbacks and delayed transactions have to be treated
separately.
8/ 22
9. Introduction Problem formulation Learning strategy Experiments Conclusion
CONCEPT DRIFT ADAPTATION
Two conventional solutions for CD adaptation are Wt and
Et [6, 5]. To learn separately from feedbacks and delayed
transactions we propose Ft, WD
t and ED
t .
Time%
All%fraudulent%transac9ons%of%a%day%
All%genuine%transac9ons%of%a%day%
Fraudulent%transac9ons%in%the%feedback%
Genuine%transac9ons%in%the%feedback%
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
Sliding'
window'
Ensemble'
M1M2 Ft
Ft
EtED
t
Wt
WD
t
Figure : Supervised information used by different classifiers in the
ensemble and sliding window approach.9/ 22
10. Introduction Problem formulation Learning strategy Experiments Conclusion
CLASSIFIER AGGREGATIONS
WD
t and ED
t have to be aggregated with Ft to exploit
information provided by feedbacks. We combine these
classifiers by averaging the posterior probabilities.
Sliding window:
PAW
t
(+|x) =
PFt (+|x) + PWD
t
(+|x)
2
Ensemble:
PAE
t
(+|x) =
PFt (+|x) + PED
t
(+|x)
2
AE
t and AW
t give larger influence to feedbacks on the
probability estimates w.r.t Et and Wt.
10/ 22
11. Introduction Problem formulation Learning strategy Experiments Conclusion
TWO RANDOM FOREST
We used two different Random Forests (RF) classifiers
depending on the fraud prevalence in the training set.
for classifiers on delayed samples we used a Balanced
RF [3] (undersampling before training each tree).
for Ft we adopted a standard RF [2] (no undersampling).
11/ 22
12. Introduction Problem formulation Learning strategy Experiments Conclusion
DATASETS
We considered two datasets of credit card transactions:
Table : Datasets
Id Start day End day # Days # Instances # Features % Fraud
2013 2013-09-05 2014-01-18 136 21,830,330 51 0.19%
2014 2014-08-05 2014-10-09 44 7,619,452 51 0.22%
In the 2013 dataset there is an average of 160k transaction per
day and about 304 frauds per day, while in the 2014 dataset
there is a daily average of 173k transactions and 380 frauds.
12/ 22
13. Introduction Problem formulation Learning strategy Experiments Conclusion
EXPERIMENTS
Settings:
We assume that after δ = 7 days all the transactions labels
are provided (delayed supervised information)
A budget of k = 100 alerts that can be checked by the
investigators (Ft is trained on a window of 700 feedbacks).
A window of α = 16 days is used to train WD
t (16 models
in ED
t )
Each experiments is repeated 10 times and the performance is
assessed using pk.
13/ 22
14. Introduction Problem formulation Learning strategy Experiments Conclusion
In both 2013 and 2014 datasets, aggregations AW
t and AE
t
outperforms the other FDSs in terms of pk.
Table : Average pk in all the batches for the sliding window
Dataset 2013 Dataset 2014
classifier mean sd mean sd
F 0.609 0.250 0.596 0.249
WD 0.540 0.227 0.549 0.253
W 0.563 0.233 0.559 0.256
AW 0.697 0.212 0.657 0.236
Table : Average pk in all the batches for the ensemble
Dataset 2013 Dataset 2014
classifier mean sd mean sd
F 0.603 0.258 0.596 0.271
ED 0.459 0.237 0.443 0.242
E 0.555 0.239 0.516 0.252
AE 0.683 0.220 0.634 0.239
14/ 22
15. Introduction Problem formulation Learning strategy Experiments Conclusion
WDWFAW
(a) Sliding window
2013
WD
WFAW
(b) Sliding window
2014
F E ED
AE
(c) Ensemble 2013
E EDFAE
(d) Ensemble 2014
Sum of ranks from
the Friedman test [4],
classifiers having the
same letter are not
significantly different
(paired t-test based
upon on the ranks).
15/ 22
16. Introduction Problem formulation Learning strategy Experiments Conclusion
EXPERIMENTS ON ARTIFICIAL DATASET WITH CD
In the second part we artificially introduce CD in specific days
by juxtaposing transactions acquired in different times of the
year.
Table : Datasets with Artificially Introduced CD
Id Start 2013 End 2013 Start 2014 End 2014
CD1 2013-09-05 2013-09-30 2014-08-05 2014-08-31
CD2 2013-10-01 2013-10-31 2014-09-01 2014-09-30
CD3 2013-11-01 2013-11-30 2014-08-05 2014-08-31
16/ 22
17. Introduction Problem formulation Learning strategy Experiments Conclusion
Table : Average pk in the month before and after CD for the sliding
window approach
(a) Before CD
CD1 CD2 CD3
classifier mean sd mean sd mean sd
F 0.411 0.142 0.754 0.270 0.690 0.252
WD 0.291 0.129 0.757 0.265 0.622 0.228
W 0.332 0.215 0.758 0.261 0.640 0.227
AW 0.598 0.192 0.788 0.261 0.768 0.221
(b) After CD
CD1 CD2 CD3
classifier mean sd mean sd mean sd
F 0.635 0.279 0.511 0.224 0.599 0.271
WD 0.536 0.335 0.374 0.218 0.515 0.331
W 0.570 0.309 0.391 0.213 0.546 0.319
AW 0.714 0.250 0.594 0.210 0.675 0.244
17/ 22
18. Introduction Problem formulation Learning strategy Experiments Conclusion
AW
W
(e) Sliding window strate-
gies on dataset CD1
AW
W
(f) Sliding window strate-
gies on dataset CD2
W
AW
(g) Sliding window strate-
gies on dataset CD3
AE
E
(h) Ensemble strategies on
dataset CD3
Figure : Average pk per day (the higher the better) for classifiers on
datasets with artificial concept drift smoothed using moving average
of 15 days. The vertical bar denotes the date of the concept drift.
18/ 22
19. Introduction Problem formulation Learning strategy Experiments Conclusion
CONCLUDING REMARKS
We notice that:
Ft outperforms classifiers on delayed samples (trained on
obsolete couples).
Ft outperforms classifiers trained on the entire supervised
dataset (dominated by delayed samples).
Aggregation gives larger influence to feedbacks.
19/ 22
20. Introduction Problem formulation Learning strategy Experiments Conclusion
CONCLUSION
We formalise a real-world FDS framework that meets
realistic working conditions.
In a real-world scenario, there is a strong alert-feedback
interaction that has to be explicitly considered
Feedbacks and delayed samples should be separately
handled when training a FDS
Aggregating two distinct classifiers is an effective strategy
and that it enables a prompter adaptation in concept
drifting environments
20/ 22
21. Introduction Problem formulation Learning strategy Experiments Conclusion
FUTURE WORK
Future work will focus on:
Adaptive aggregation of Ft and the classifier trained on
delayed samples.
Study the sample selection bias in Ft introduced by
alert-feedback interaction.
21/ 22
22. Introduction Problem formulation Learning strategy Experiments Conclusion
BIBLIOGRAPHY
[1] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland.
Data mining for credit card fraud: A comparative study.
Decision Support Systems, 50(3):602–613, 2011.
[2] L. Breiman.
Random forests.
Machine learning, 45(1):5–32, 2001.
[3] C. Chen, A. Liaw, and L. Breiman.
Using random forest to learn imbalanced data.
University of California, Berkeley, 2004.
[4] M. Friedman.
The use of ranks to avoid the assumption of normality implicit in the analysis of variance.
Journal of the American Statistical Association, 32(200):675–701, 1937.
[5] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu.
Classifying data streams with skewed class distributions and concept drifts.
Internet Computing, 12(6):37–49, 2008.
[6] D. K. Tasoulis, N. M. Adams, and D. J. Hand.
Unsupervised clustering in streaming data.
In ICDM Workshops, pages 638–642, 2006.
22/ 22