The document discusses several case studies and applications of data mining including:
1) Customer attrition prediction helped a mobile phone company reduce attrition rates from over 2%/month to under 1.5%/month.
2) Credit risk models used by banks to predict loan defaults enabled proliferation of mortgages and credit cards.
3) Amazon's product recommendations were successful by clustering customers based on products purchased.
4) A case study of MetLife found $30 million in fraudulent insurance claims through data mining of a $50 million consolidated database within companies worldwide to detect fraud like rate evasion faster than manual methods.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
Introduction to Data Mining(Chapter 1)......Data Mining concepts and techniques by R. Deepa (IT) ..Batch(2016-2019) published on Oct-13 2018 from NS college of Arts and Science,Theni
This Is just a little overview on it not fully explaned.
Data Mining:-
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information.
Data Mining is the Process that is used by big companies or organizations to handle,balance and analyzing big data.
Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among internal factors such as price, product positioning, or staff skills, and external factor.
It is used by the companies to increase their revenue or cut their costs or both.
While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two.
Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries.
SOFTWARES FOR DATA MINING:
Microsoft SQL SERVER 2005.
Mircrsoft SQL SERVER 2008.
Oracle Data Mining etc.
One Super Market in Canada used data mining capacity of Oracle Software to analyze local buying patterns.They discovered that when Mens bought Food for home on Saturday and Sunday They like to tended to buy beer.On Other days of the week mens don’t usually buy beers.
The Shopkeeper said to his workers to put sufficient amount of beers on Saturday and Sunday.In this way income of the shop was increased.
BI refers to applications & technologies which are use to gather information about their company opertaions.
Data Mining is importand part of business intellegence.
Some Basic Examples of Use of Data Mining Are Given Below:
In Finance Data Mining is used for Credit Cards Analysis.
Astronomy:
Palomar Obstervatory discovered 22 quasars with the help of Data Mining.
3) Telecommunication:
In Telecommunication Data Mining is used for Call Records.
4) Offices:
In Offices it is used for to balance data and records of the staff. etc
Following are some of the types if Data Mining:
Assoication Rule is used for store layout. Etc.
Classification is used for weather prediction. Etc.
Clustering is used for Graphical Represention of Universe.
Sequential Pattern is used for medical diagnosis.
THANK YOU...!!!
Introduction to Data Mining(Chapter 1)......Data Mining concepts and techniques by R. Deepa (IT) ..Batch(2016-2019) published on Oct-13 2018 from NS college of Arts and Science,Theni
This Is just a little overview on it not fully explaned.
Data Mining:-
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information.
Data Mining is the Process that is used by big companies or organizations to handle,balance and analyzing big data.
Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among internal factors such as price, product positioning, or staff skills, and external factor.
It is used by the companies to increase their revenue or cut their costs or both.
While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two.
Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries.
SOFTWARES FOR DATA MINING:
Microsoft SQL SERVER 2005.
Mircrsoft SQL SERVER 2008.
Oracle Data Mining etc.
One Super Market in Canada used data mining capacity of Oracle Software to analyze local buying patterns.They discovered that when Mens bought Food for home on Saturday and Sunday They like to tended to buy beer.On Other days of the week mens don’t usually buy beers.
The Shopkeeper said to his workers to put sufficient amount of beers on Saturday and Sunday.In this way income of the shop was increased.
BI refers to applications & technologies which are use to gather information about their company opertaions.
Data Mining is importand part of business intellegence.
Some Basic Examples of Use of Data Mining Are Given Below:
In Finance Data Mining is used for Credit Cards Analysis.
Astronomy:
Palomar Obstervatory discovered 22 quasars with the help of Data Mining.
3) Telecommunication:
In Telecommunication Data Mining is used for Call Records.
4) Offices:
In Offices it is used for to balance data and records of the staff. etc
Following are some of the types if Data Mining:
Assoication Rule is used for store layout. Etc.
Classification is used for weather prediction. Etc.
Clustering is used for Graphical Represention of Universe.
Sequential Pattern is used for medical diagnosis.
THANK YOU...!!!
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
Apache Kylin, which started as a big data OLAP engine, is reaching its v2.0. Yang Li explains how, armed with snowflake schema support, a full SQL interface, spark cubing, and the ability to consume real-time streaming data, Apache Kylin is closing the gap to becoming a real-time data warehouse.
Talks about best practices and patterns on how to design an efficient cube in Kylin. Covers concepts like mandatory dimension, hierarchy dimension, derived dimension, incremental build, aggregation group etc.
Apache Kylin’s Performance Boost from Apache HBaseHBaseCon
Hongbin Ma and Luke Han (Kyligence)
Apache Kylin is an open source distributed analytics engine that provides a SQL interface and multi-dimensional analysis on Hadoop supporting extremely large datasets. In the forthcoming Kylin release, we optimized query performance by exploring the potentials of parallel storage on top of HBase. This talk explains how that work was done.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dbm630 lecture10
1. DBM630: Data Mining and
Data Warehousing
MS.IT. Rangsit University
Semester 2/2011
Lecture 10
Data Mining: Case Studies and
Applications
by Kritsada Sriphaew (sriphaew.k AT gmail.com)
1
2. Topics
Data Mining Goals
Data Floods
Machine Learning / Data Mining Application
areas
Case Studies
2 Data Warehousing and Data Mining by Kritsada Sriphaew
3. THE FOUR GOALS OF DATA MINING
• Prediction: Using current data to make prediction on
future activities
• Identification: "Data patterns can be used
to identify the existence of an item, an
event, or an activity“
• Classification: Breaking the data down into categories
based on certain attributes.
• Optimization: Using the mined data to make
optimizations on resources, such as time,
money, etc.
3 Data Warehousing and Data Mining by Kritsada Sriphaew
4. Trends leading to Data Flood
More data is generated:
Bank, telecom, other
business transactions ...
Scientific data: astronomy,
biology, etc
Web, text, and e-commerce
4 Data Warehousing and Data Mining by Kritsada Sriphaew
5. Big Data Examples
Europe's Very Long Baseline Interferometry (VLBI)
has 16 telescopes, each of which produces 1
Gigabit/second of astronomical data over a 25-day
observation session
storage and analysis a big problem
AT&T handles billions of calls per day
so much data, it cannot be all stored -- analysis has to be
done “on the fly”, on streaming data
5 Data Warehousing and Data Mining by Kritsada Sriphaew
6. Largest databases in 2003
Commercial databases:
Winter Corp. 2003 Survey: France Telecom has largest
decision-support DB, ~30TB; AT&T ~ 26 TB
Web
Alexa internet archive: 7 years of data, 500 TB
Google searches 4+ Billion pages, several hundreds TB
IBM WebFountain, 160 TB (2003)
Internet Archive (www.archive.org),~ 300 TB
6 Data Warehousing and Data Mining by Kritsada Sriphaew
7. 5 million terabytes created in 2002
UC Berkeley 2003 estimate: 5 exabytes (5 million
terabytes) of new data was created in 2002.
www.sims.berkeley.edu/research/projects/how-much-info-2003/
US produces ~40% of new stored data worldwide
1018
7 Data Warehousing and Data Mining by Kritsada Sriphaew
8. Data Growth Rate
Twice as much information was created in 2002 as
in 1999 (~30% growth rate)
Other growth rate estimates even higher
Very little data will ever be looked at by a human
Knowledge Discovery is NEEDED to make sense and
use of data.
8 Data Warehousing and Data Mining by Kritsada Sriphaew
9. Machine Learning / Data Mining
Application areas
Science
astronomy, bioinformatics, drug discovery, …
Business
advertising, CRM (Customer Relationship management),
investments, manufacturing, sports/entertainment, telecom,
e-Commerce, targeted marketing, health care, …
Web:
search engines, advertising, web and text mining, …
Government
surveillance & anti-terror (?|), crime detection, profiling tax
cheaters, …
9 Data Warehousing and Data Mining by Kritsada Sriphaew
10. DM applications in 2004 (in %)
13%: Banking
9%: Direct Marketing, Fraud Detection, Scientific data analysis
8%: Bioinformatics
7%: Insurance, Medical/Pharmaceutic Applications
6%: eCommerce/Web, Telecommunications
4%: Investments/Stocks, Manufacturing, Retail, Security
Bellow: Travel, Entertainment/News, …
10 Data Warehousing and Data Mining by Kritsada Sriphaew
11. Data Mining for Customer Modeling
Customer Tasks:
attrition prediction (odchod zákazníků)
targeted marketing:
cross-sell, customer acquisition
credit-risk
fraud detection
Industries
banking, telecom, retail sales (maloobchodní prodej), …
11 Data Warehousing and Data Mining by Kritsada Sriphaew
12. Customer Attrition: Case Study
Situation: Attrition rate for mobile phone customers is
around 25-30% a year!
Task:
Given customer information for the past N months,
predict who is likely to attrite next month.
Also, estimate customer value and what is the cost-
effective offer to be made to this customer.
12 Data Warehousing and Data Mining by Kritsada Sriphaew
13. Customer Attrition Results
Verizon Wireless built a customer data warehouse
Identified potential attriters
Developed multiple, regional models
Targeted customers with high propensity to accept
the offer
Reduced attrition rate from over 2%/month to under
1.5%/month (huge impact, with >30 M subscribers)
(Reported in 2003)
13 Data Warehousing and Data Mining by Kritsada Sriphaew
14. Assessing Credit Risk: Case Study
Situation: Person applies for a loan
Task: Should a bank approve the loan?
Note: People who have the best credit don’t need
the loans, and people with worst credit are not likely
to repay. Bank’s best customers are in the middle
14 Data Warehousing and Data Mining by Kritsada Sriphaew
15. Credit Risk - Results
Banks develop credit models using variety of
machine learning methods.
Mortgage and credit card proliferation are the results
of being able to successfully predict if a person is
likely to default on a loan
Widely deployed in many countries
15 Data Warehousing and Data Mining by Kritsada Sriphaew
16. Successful e-commerce – Case Study
A person buys a book (product) at Amazon.com.
Task: Recommend other books (products) this person
is likely to buy
Amazon does clustering based on books bought:
customers who bought “Advances in Knowledge Discovery
and Data Mining”, also bought “Data Mining: Practical
Machine Learning Tools and Techniques with Java
Implementations”
Recommendation program is quite successful
16 Data Warehousing and Data Mining by Kritsada Sriphaew
17. Unsuccessful e-commerce case study (KDD-Cup
2000)
Data: clickstream and purchase data from Gazelle.com, legwear and
legcare e-tailer
Q: Characterize visitors who spend more than $12 on an average order
at the site
Dataset of 3,465 purchases, 1,831 customers
Very interesting analysis by Cup participants
thousands of hours - $X,000,000 (Millions) of consulting
Total sales -- $Y,000
Obituary: Gazelle.com out of business, Aug 2000
17 Data Warehousing and Data Mining by Kritsada Sriphaew
18. Genomic Microarrays – Case Study
Given microarray data for a number of samples
(patients), can we
Accurately diagnose the disease?
Predict outcome for given treatment?
Recommend best treatment?
18 Data Warehousing and Data Mining by Kritsada Sriphaew
19. Example: ALL/AML data
38 training cases, 34 test, ~ 7,000 genes
2 Classes: Acute Lymphoblastic Leukemia (ALL) vs
Acute Myeloid Leukemia (AML)
Use train data to build diagnostic model
ALL AML
Results on test data:
33/34 correct, 1 error may be mislabeled
19 Data Warehousing and Data Mining by Kritsada Sriphaew
20. Security and Fraud Detection - Case Study
Credit Card Fraud Detection
Detection of Money laundering
FAIS (US Treasury)
Securities Fraud
NASDAQ KDD system
Phone fraud
AT&T, Bell Atlantic, British Telecom/MCI
Bio-terrorism detection at Salt Lake
Olympics 2002
20 Data Warehousing and Data Mining by Kritsada Sriphaew
21. Data Mining and Privacy
in 2006, NSA (National Security Agency) was reported to
be mining years of call info, to identify terrorism
networks
Social network analysis has a potential to find networks
Invasion of privacy – do you mind if your call information
is in a gov database?
What if NSA program finds one real suspect for 1,000
false leads ? 1,000,000 false leads?
21 Data Warehousing and Data Mining by Kritsada Sriphaew
22. Case Study: Intrusion Detection Systems
Detection Approach
Misuse Detection
▪ Based on known malicious patterns
(signatures)
Anomaly Detection
▪ Based on deviations from established
normal patterns (profiles)
Data Source
Network-based (NIDS)
▪ Network traffic
Host-based (HIDS)
▪ Audit trails
23 Data Warehousing and Data Mining by Kritsada Sriphaew
23. Case Study: Intrusion Detection Systems
Data Mining Usage
Signature extraction
Rule matching
Alarm data analysis
Reduce false alarms
Eliminate redundant alarms
Feature selection
Training Data cleaning
24 Data Warehousing and Data Mining by Kritsada Sriphaew
24. Case Study: Intrusion Detection Systems
Behavioral Feature for Network Anomaly Detection
Training set = normal network traffic
Feature provides semantics of the values of data
Feature selection is important
Proposed method:
▪ Feature extraction based on protocol behavior
▪ Many Attacks uses protocol improperly
▪ Ping of Death
▪ SYN Flood
▪ Teardrop
25 Data Warehousing and Data Mining by Kritsada Sriphaew
25. Case Study: Intrusion Detection Systems
Decision Tree (only small part)
<=0.4 WWW
tcpPerPSH <=0.79 SMTP
>0.01 >0.4
tcpPerPSH
>0.79 FTP
tcpPerFIN
<=0.03 telnet >0.79 SMTP
<=0.01
>546773 tcpPerSYN >73 tcpPerPSH
meanIAT >0.03 meanipTLen …
>546773 … <=73
…
26 Data Warehousing and Data Mining by Kritsada Sriphaew
26. CASE STUDY: METLIFE
Company Profile
MetLife, Inc. is a leading provider of insurance and other financial
services to millions of individual and institutional customers
throughout the United States.
Established in 1863, Metlife now has offices all over
the US and the world, and offers ten different types
of insurances and financial services.
27 Data Warehousing and Data Mining by Kritsada Sriphaew
27. CASE STUDY: METLIFE
Industry: Insurance and Financial Services
How they use Data Mining: Fraud Detection
28 Data Warehousing and Data Mining by Kritsada Sriphaew
28. CASE STUDY: METLIFE
• Project first started in 2001
• MetLife set out to build $50 Million relational database
• This project would consolidate data from 30 business
world wide.
29 Data Warehousing and Data Mining by Kritsada Sriphaew
29. CASE STUDY: METLIFE
• Around same time, it was reported that $30 Million of
insurance money went to fraudulent claims.
• MetLife teamed up with Computer Sciences Corporation
(CSC) to
o License their data mining tool (called Fraud
Investigator),
o Develop @First, "an early fraud
detection system"
30 Data Warehousing and Data Mining by Kritsada Sriphaew
30. CASE STUDY: METLIFE
• By 2003, MetLife's data mining operation was in full swing.
• They were able to detect fraud in a fraction of the time it
would take in man hours
• One example is detecting rate evasion
31 Data Warehousing and Data Mining by Kritsada Sriphaew
31. CASE STUDY: METLIFE
• Rate evasion is lying about where you live to pay lower
premiums.
• Metlife used data mining to detect rate evasion by
matching ZIP codes with phone numbers to see if the
cities matched.
• In 2.5 hours, Metlife found 107 fraudulent claims,
all linked to a rate-evasion ring in NY and
Massachusetts.
32 Data Warehousing and Data Mining by Kritsada Sriphaew