The document discusses data mining and its history and applications. It explains that data mining involves extracting useful patterns from large amounts of data, and has been used in applications like market analysis, fraud detection, and science. The document outlines the data mining process, including data selection, cleaning, transformation, algorithm selection, and pattern evaluation. It also discusses common data mining techniques like association rule mining to find frequent patterns in transactional data.
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
presentation on recent data mining Techniques ,and future directions of research from the recent research papers made in Pre-master ,in Cairo University under supervision of Dr. Rabie
A companion slide deck for this chapter:
Stanton, J. M. (2013). Data Mining: A Practical Introduction for Organizational Researchers. In Cortina, J. M., & Landis, R. S., Modern Research Methods for the Study of Behavior in Organizations. New York: Routledge Academic.
Data Mining, KDD Process, Data mining functionalities, Characterization,
Discrimination ,
Association,
Classification,
Prediction,
Clustering,
Outlier analysis, Data Cleaning as a Process
Abstract: Knowledge has played a significant role on human activities since his development. Data mining is the process of
knowledge discovery where knowledge is gained by analyzing the data store in very large repositories, which are analyzed
from various perspectives and the result is summarized it into useful information. Due to the importance of extracting
knowledge/information from the large data repositories, data mining has become a very important and guaranteed branch of
engineering affecting human life in various spheres directly or indirectly. The purpose of this paper is to survey many of the
future trends in the field of data mining, with a focus on those which are thought to have the most promise and applicability
to future data mining applications.
Keywords: Current and Future of Data Mining, Data Mining, Data Mining Trends, Data mining Applications.
key note address delivered on 23rd March 2011 in the Workshop on Data Mining and Computational Biology in Bioinformatics, sponsored by DBT India and organised by Unit of Simulation and Informatics, IARI, New Delhi.
I do not claim any originality either to slides or their content and in fact aknowledge various web sources.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
presentation on recent data mining Techniques ,and future directions of research from the recent research papers made in Pre-master ,in Cairo University under supervision of Dr. Rabie
A companion slide deck for this chapter:
Stanton, J. M. (2013). Data Mining: A Practical Introduction for Organizational Researchers. In Cortina, J. M., & Landis, R. S., Modern Research Methods for the Study of Behavior in Organizations. New York: Routledge Academic.
Data Mining, KDD Process, Data mining functionalities, Characterization,
Discrimination ,
Association,
Classification,
Prediction,
Clustering,
Outlier analysis, Data Cleaning as a Process
Abstract: Knowledge has played a significant role on human activities since his development. Data mining is the process of
knowledge discovery where knowledge is gained by analyzing the data store in very large repositories, which are analyzed
from various perspectives and the result is summarized it into useful information. Due to the importance of extracting
knowledge/information from the large data repositories, data mining has become a very important and guaranteed branch of
engineering affecting human life in various spheres directly or indirectly. The purpose of this paper is to survey many of the
future trends in the field of data mining, with a focus on those which are thought to have the most promise and applicability
to future data mining applications.
Keywords: Current and Future of Data Mining, Data Mining, Data Mining Trends, Data mining Applications.
6 weeks summer training in data mining,jalandhardeepikakaler1
e2matrix is a leading Web Design and Development Company now in the field of Industrial training. We provide you 6 Month/6 Week Industrial training in PhP,Web Designing, Java, Dot Net, android Applications.
we also provide work for various technoligies with additional facilities-
RESEARCH PAPERS
OBJECTIVES
SYNOPSIS
IMPLEMENTATION
DOCUMENTATION
REPORT WRITING
PAPER PUBLICATION
Address-Opp. Phagwara Bus Stand, Above Bella
Pizza, Handa City Center, Phagwara,punjab
email addres-e2matrixphagwara@gmail.com
jalandhare2matrix@gmail.com
WEBSITE-www.e2matrix.com
CONTACT NUMBER --
09041262727
07508509730
7508509709
6months industrial training in data mining,ludhianadeepikakaler1
E2marix is leading Training & Certification Company offering Corporate Training Programs, IT Education Courses in diversified areas.Since its inception, E2matrix educational Services have trained and certified many students and professionals.
TECHNOLOGIES PROVIDED -
MATLAB
NS2
IMAGE PROCESSING
.NET
SOFTWARE TESTING
DATA MINING
NEURAL networks
HFSS
WEKA
ANDROID
CLOUD computing
COMPUTER NETWORKS
FUZZY LOGIC
ARTIFICIAL INTELLIGENCE
LABVIEW
EMBEDDED
VLSI
Address
Opp. Phagwara Bus Stand, Above Bella
Pizza, Handa City Center, Phagwara
email-e2matrixphagwara@gmail.com
jalandhare2matrix@gmail.com
Web site-www.e2matrix.com
CONTACT NUMBER --
07508509730
09041262727
7508509709
6months industrial training in data mining, jalandhardeepikakaler1
e2matrix is a leading Web Design and Development Company now in the field of Industrial training. We provide you 6 Month/6 Week Industrial training in PhP,Web Designing, Java, Dot Net, android Applications.
we also provide work for various technoligies with additional facilities-
RESEARCH PAPERS
OBJECTIVES
SYNOPSIS
IMPLEMENTATION
DOCUMENTATION
REPORT WRITING
PAPER PUBLICATION
Address-Opp. Phagwara Bus Stand, Above Bella
Pizza, Handa City Center, Phagwara,punjab
email addres-e2matrixphagwara@gmail.com
jalandhare2matrix@gmail.com
WEBSITE-www.e2matrix.com
CONTACT NUMBER --
09041262727
07508509730
7508509709
6 weeks summer training in data mining,ludhianadeepikakaler1
E2marix is leading Training & Certification Company offering Corporate Training Programs, IT Education Courses in diversified areas.Since its inception, E2matrix educational Services have trained and certified many students and professionals.
TECHNOLOGIES PROVIDED -
MATLAB
NS2
IMAGE PROCESSING
.NET
SOFTWARE TESTING
DATA MINING
NEURAL networks
HFSS
WEKA
ANDROID
CLOUD computing
COMPUTER NETWORKS
FUZZY LOGIC
ARTIFICIAL INTELLIGENCE
LABVIEW
EMBEDDED
VLSI
Address
Opp. Phagwara Bus Stand, Above Bella
Pizza, Handa City Center, Phagwara
email-e2matrixphagwara@gmail.com
jalandhare2matrix@gmail.com
Web site-www.e2matrix.com
CONTACT NUMBER --
07508509730
09041262727
7508509709
Data mining final year project in ludhianadeepikakaler1
Are you so occupied with your family and work that you don’t even have any more time left for your MBA assignments or thesis?
E2matrix offer our assistance, writing and consulting services with your research assignments particularly in the areas of thesis, dissertations, journals, online forum discussions, FYP, and so on.
We also provide training for the different technologies and are involved in a wide diversity of subject areas ranging from management,engineering up to programming and designs; and our team of research experts and professional consultants are readily available to help you towards your successful completion of your assignments.
Engage us today at our e2matrixphagwara@gmail.com
jalandhare2matrix@gmail.com
and can visit our web site-www.e2matrix.com
contact us-7508509709
07508509730
09041262727
Address us - Opp. Phagwara Bus Stand, Above Bella
Pizza, Handa City Center, Phagwara
Data mining final year project in jalandhardeepikakaler1
e2 matrix provides IT consulting services to its customers. e2 matrix provides the flexible practices that enable companies to operate more efficiently and produce more value. We also offer a wide-range of technologies such as-
MATLAB
NS2
IMAGE PROCESSING
.NET
SOFTWARE TESTING
DATA MINING
NEURAL networks
HFSS
WEKA
ANDROID
CLOUD computing
COMPUTER NETWORKS
FUZZY LOGIC
ARTIFICIAL INTELLIGENCE
LABVIEW
EMBEDDED
VLSI
We are professionals who are driven by the viewpoint of customer satisfaction through Quality and Innovation.
e2matrix believe in an open working relationship produces a positive and productive work environment that results in effective and low-cost solutions. We maximize the customer benefits by bringing the most pioneering solutions.
Address-Opp. Phagwara Bus Stand, Above Bella
Pizza, Handa City Center, Phagwara,punjab
email addres-e2matrixphagwara@gmail.com
jalandhare2matrix@gmail.com
WEBSITE-www.e2matrix.com
CONTACT NUMBER --
09041262727
07508509730
7508509709
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
2. Motivation: Why data mining?
What is data mining?
Data Mining: On what kind of data?
Data mining functionality
Are all the patterns interesting?
Major issues in data mining
2
3. Data explosion problem
Automated data collection tools and mature database technology
lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
Data warehousing and on-line analytical processing
Extraction of interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases
3
4. 1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO, deductive,
etc.) and application-oriented DBMS (spatial, scientific, engineering,
etc.)
1990s—2000s:
Data mining and data warehousing, multimedia databases, and Web
databases
4
5. Data mining (knowledge discovery in databases):
Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) information or patterns from
data in large databases
Alternative names:
Data mining: a misnomer?
Knowledge discovery(mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
What is not data mining?
(Deductive) query processing.
Expert systems or small ML/statistical programs
5
6. Database analysis and decision support
Market analysis and management
▪ target marketing, customer relation management, market
basket analysis, cross selling, market segmentation
Risk analysis and management
▪ Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
Fraud detection and management
Other Applications
Text mining (news group, email, documents)
Stream data mining
Web mining.
DNA data analysis
6
7. Where are the data sources for analysis?
Credit card transactions, loyalty cards, discount coupons, customer
complaint calls, plus (public) lifestyle studies
Target marketing
Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
Determine customer purchasing patterns over time
Conversion of single to a joint bank account: marriage, etc.
Cross-market analysis
Associations/co-relations between product sales
Prediction based on the association information
7
8. Customer profiling
data mining can tell you what types of customers buy what products
(clustering or classification)
Identifying customer requirements
identifying the best products for different customers
use prediction to find what factors will attract new customers
Provides summary information
various multidimensional summary reports
statistical summary information (data central tendency and
variation)
8
9. Finance planning and asset evaluation
cash flow analysis and prediction
contingent claim analysis to evaluate assets
cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
Resource planning:
summarize and compare the resources and spending
Competition:
monitor competitors and market directions
group customers into classes and a class-based pricing procedure
set pricing strategy in a highly competitive market
9
10. Applications
widely used in health care, retail, credit card services,
telecommunications (phone card fraud), etc.
Approach
use historical data to build models of fraudulent behavior and use
data mining to help identify similar instances
Examples
auto insurance: detect a group of people who stage accidents to
collect on insurance
money laundering: detect suspicious money transactions (US
Treasury's Financial Crimes Enforcement Network)
medical insurance: detect professional patients and ring of doctors
and ring of references
10
11. Detecting inappropriate medical treatment
Australian Health Insurance Commission identifies that in many cases
blanket screening tests were requested (save Australian $1m/yr).
Detecting telephone fraud
Telephone call model: destination of the call, duration, time of day or
week. Analyze patterns that deviate from an expected norm.
British Telecom identified discrete groups of callers with frequent
intra-group calls, especially mobile phones, and broke a multimillion
dollar fraud.
Retail
Analysts estimate that 38% of retail shrink is due to dishonest
employees.
11
12. Sports
IBM Advanced Scout analyzed NBA game statistics (shots blocked,
assists, and fouls) to gain competitive advantage for New York
Knicks and Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22 quasars with the
help of data mining
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to Web access logs for
market-related pages to discover customer preference and behavior
pages, analyzing effectiveness of Web marketing, improving Web
site organization, etc.
12
13. Pattern Evaluation
Data mining: the core of
knowledge discovery
process. Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
13
14. Learning the application domain:
relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation:
Find useful features, dimensionality/variable reduction, invariant
representation.
Choosing functions of data mining
summarization, classification, regression, association, clustering.
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
14
15. Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial and temporal data
Time-series data and stream data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
15
16. Association rule mining:
Finding frequent patterns, associations, correlations, or causal
structures among sets of items or objects in transaction databases,
relational databases, and other information repositories.
Frequent pattern: pattern (set of items, sequence, etc.) that occurs
frequently in a database
Motivation: finding regularities in data
What products were often purchased together? — Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
16
17. Transaction-id Items bought Itemset X={x1, …, xk}
10 A, B, C Find all the rules XY with min
20 A, C confidence and support
30 A, D support, s, probability that a
40 B, E, F transaction contains X∪Y
confidence, c, conditional probability
that a transaction having X also
Customer Customer contains Y.
buys both buys
diapers
Let min_support = 50%,
min_conf = 50%:
Customer
A C (50%, 66.7%)
buys beer C A (50%, 100%)
17
18. Transaction-id Items bought Min. support 50%
10 A, B, C Min. confidence 50%
20 A, C
30 A, D Frequent pattern Support
40 B, E, F {A} 75%
{B} 50%
{C} 50%
{A, C} 50%
For rule A ⇒ C:
support = support({A}∪{C}) = 50%
confidence = support({A}∪{C})/support({A}) = 66.6%
18
19. Any subset of a frequent itemset must be frequent
if {beer, diaper, nuts} is frequent, so is {beer, diaper}
every transaction having {beer, diaper, nuts} also contains {beer, diaper}
Apriori pruning principle: If there is any itemset which is infrequent, its
superset should not be generated/tested!
Method:
generate length (k+1) candidate itemsets from length k frequent itemsets,
and
test the candidates against DB
The performance studies show its efficiency and scalability
19
21. Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=∅; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return ∪k Lk;
21
22. How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
▪ abcd from abc and abd
▪ acde from acd and ace
Pruning:
▪ acde is removed because ade is not in L3
C4={abcd}
22
23. Suppose the items in Lk-1 are listed in an order
Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1
Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
23
24. Finding models (functions) that describe and
distinguish classes or concepts for future prediction
E.g., classify countries based on climate, or classify
cars based on gas mileage
Presentation: decision-tree, classification rule, neural
network
Prediction: Predict some unknown or missing
numerical values
24
25. Classification
Algorithms
Training
Data
NAM E RANK YEARS TENURED Classifier
M ike Assistant Prof 3 no (Model)
M ary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
25
26. Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
M erlisa Associate Prof 7 no
G eorge Professor 5 yes
Joseph Assistant Prof 7 yes
26
27. age income student credit_rating
<=30 high no fair
Training <=30
31…40
high
high
no excellent
no fair
set >40 medium no fair
>40 low yes fair
>40 low yes excellent
31…40 low yes excellent
<=30 medium no fair
<=30 low yes fair
>40 medium yes fair
<=30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
>40 medium no excellent
27
28. age?
<=30 overcast
30..40 >40
student? yes credit rating?
no yes excellent fair
no yes no yes
28
29. Cluster analysis
Class label is unknown: Group data to form new classes, e.g., cluster houses
to find distribution patterns
Clustering based on the principle: maximizing the intra-class similarity and
minimizing the interclass similarity
Outlier analysis
Outlier: a data object that does not comply with the general behavior of the
data
It can be considered as noise or exception but is quite useful in fraud
detection, rare events analysis
29