SlideShare a Scribd company logo
Data Mining
Introduction
intro
Data mining is a powerful new
technology with great potential to help
companies focus on the most important
information in the data they have
collected about the behavior of their
customers and potential customers.
Data collections in the real world






Ten largest transaction-processing
databases range from 3 to 18
Terabytes
Ten largest decision support databases
range from 10 to 29 Terabytes
Sizes have doubled / tripled between
2001 and end of 2003
Questions arise






Is there any new, unexpected and
potentially useful information contained
in this data?
Can we use historical data to predict
future outcomes?
(e.g. customer behavior, fraud
detection, etc.)
Some examples of data mining
1.

Telecommunications

Huge amount of data is collected daily
 Transactional data (about each phone call)
 Data on mobile phones, house based phones, Internet, etc.)
 Other customer data (billing, personal information, etc.)
 Additional data (network load, faults, etc.)
Questions arises
 Which customer group is highly profitable, which one is not?
 To which customers should we advertise what kind of special
offers?
 What kind of call rates would increase profit without loosing good
customers?
 How do customer profiles change over time?
 Fraud detection (stolen mobile phones or phone cards

Another
2. Health
 Different aspects of the health system
 Personal health records (at GPs, specialists, etc.)
 Hospital data (e.g. admission data, midwives data,
surgery data)
 Billing information (Medicare, PBS)
Questions
 Are doctors following the procedures (e.g. prescription of
medication)?
 Adverse drug reactions (analysis of different data
collections to find correlations)
 Are people committing fraud (e.g. doctor shoppers)
 Correlations between social and environmental issues
and people's health?
What is data mining?


Data Mining is the automated extraction
of previously unrealized information
from Large data sources for the
purpose of supporting business actions.
Some more definitions






Knowledge discovery in databases is the
non-trivial process of identifying valid, novel,
potentially useful, and ultimately
understandable patterns in data.
An information extraction activity whose goal
is to discover hidden facts contained in
databases.
Data mining, or knowledge discovery, is the
computer-assisted process of digging through
and analyzing enormous sets of data and
then extracting the meaning of the data.
Data mining process
Data mining process






Extract, transform, and load transaction
data onto the data warehouse system.
Store and manage the data in a
multidimensional database system.
Provide data access to business
analysts and information technology
professionals.
Data mining process




Analyze the data by application
software.
Present the data in a useful format,
such as a graph or table.
DM is multi disciplinary
What they do
Detect patterns in data: Rules, patterns,
classes, associations and functional
dependencies, outliers, data distributions,
clusters
How they do it



Search through data and pattern space,
non-parametric modelling, filtering,
aggregation
How well they do it
Errors and biases, over-fitting,
confounding effects, speed, scalability
Challenges in DM






Data size
 Size of data collections grows more than
linear, doubling every 18 months
 Scalable algorithms are needed
 Data complexity
Different types of data (free text, HTML, XML,
multimedia)
Dimensionality of the data increases (more
attributes)
Challenges contd..






The curse of dimensionality affects many
algorithms
(for example find nearest neighbors in high
dimensions)
Data quality
 Real world data is messy and dirty
(missing and out-of-date values,
typographical errors, different
coding/formats, etc.)
Why mine data?







Data is being recorded
Recorded data is being warehoused
Computing power is affordable
Competitive pressure is strong
Commercial DM products are available
It provides support for business
decisions
Value to business






Market segmentation - Identify the
common characteristics of customers
who buy the same products from your
company.
Customer churn - Predict which
customers are likely to leave your
company and go to a competitor.
Fraud detection - Identify which
transactions are most likely to be
fraudulent.
Value to business




Interactive marketing - Predict what each
individual accessing a Web site is most
likely interested in seeing.
Market basket analysis - Understand what
products or services are commonly
purchased together; e.g., beer and
diapers.
Value to business






Trend analysis - Reveal the difference
between a typical customer this month
and last.
Data mining can also effectively deal with
missing, inconsistent, and noisy data.
Direct marketing - Identify which prospects
should be included in a mailing list to
obtain the highest response rate.

More Related Content

What's hot

Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
8trackweb
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
priya jain
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data mining
cegonsoft1999
 
Data mining
Data miningData mining
Data mining
heba_ahmad
 
Data mining
Data miningData mining
Data mining
Alisha Korpal
 
Big data overview
Big data overviewBig data overview
Big data overview
Shyam Sunder Budhwar
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
VijayasankariS
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
Mohsin Nadaf
 
Big data
Big dataBig data
Big data
Bhuvana Patt
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & Analytics
Prasad Chitta
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big Data
Miguel Ángel Gómez
 
Data mining
Data miningData mining
Data mining
aaryarun06
 
Big data
Big dataBig data
Data mining
Data miningData mining
Data mining
jadhav_priti
 
Sample
Sample Sample
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
grinu
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
IOSR Journals
 
Data mining
Data miningData mining
Data mining
SumitMuley2
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
Shubha Brota Raha
 
BIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALABIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALA
Saikiran Panjala
 

What's hot (20)

Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & Analytics
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big Data
 
Data mining
Data miningData mining
Data mining
 
Big data
Big dataBig data
Big data
 
Data mining
Data miningData mining
Data mining
 
Sample
Sample Sample
Sample
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
BIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALABIG DATA BY SAIKIRAN PANJALA
BIG DATA BY SAIKIRAN PANJALA
 

Viewers also liked

Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
Aiswaryadevi Jaganmohan
 
Featured Speakers and Chefs
Featured Speakers and ChefsFeatured Speakers and Chefs
Featured Speakers and Chefs
April Phomthavong
 
Aγωγη του καταναλωτη
Aγωγη του καταναλωτηAγωγη του καταναλωτη
Aγωγη του καταναλωτηapostolisn
 
2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com
The House Boardshop
 
Gamification
GamificationGamification
Gamification
gmenvielle
 
شهادات تاهلية1
شهادات تاهلية1شهادات تاهلية1
شهادات تاهلية1
Salem Salem
 
Avalanche Survival Infographic
Avalanche Survival InfographicAvalanche Survival Infographic
Avalanche Survival Infographic
The House Boardshop
 
Selection of Human Resource
Selection of Human ResourceSelection of Human Resource
Selection of Human Resource
aizellbernal
 
May loc nuoc home pure
May loc nuoc home pureMay loc nuoc home pure
May loc nuoc home pure
JonathanLuong
 
National parks
National parks National parks
National parks
thickman11
 
Mapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, EmbracingMapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, Embracing
Lizlangdon
 
Langdon Liz artwork portfolio
Langdon Liz  artwork portfolioLangdon Liz  artwork portfolio
Langdon Liz artwork portfolio
Lizlangdon
 
Progettare Media Education nelle scuole
Progettare Media Education nelle scuoleProgettare Media Education nelle scuole
Progettare Media Education nelle scuole
sabrina contu
 
Bejeweled
BejeweledBejeweled
Bejeweled
aizellbernal
 
e-learning
e-learninge-learning
e-learning
aizellbernal
 
Aboutmydaughter
AboutmydaughterAboutmydaughter
Aboutmydaughter
runmicrogirl
 
Teori Belajar Brunner
Teori Belajar BrunnerTeori Belajar Brunner
Teori Belajar Brunner
Hilda Ramadhani
 
Question 2
Question 2Question 2
Question 2
Amy Nesbitt
 
English Comp 1 Who am I?
English Comp 1 Who am I?English Comp 1 Who am I?
English Comp 1 Who am I?Brian D'Almeida
 
Presentation2
Presentation2Presentation2

Viewers also liked (20)

Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Featured Speakers and Chefs
Featured Speakers and ChefsFeatured Speakers and Chefs
Featured Speakers and Chefs
 
Aγωγη του καταναλωτη
Aγωγη του καταναλωτηAγωγη του καταναλωτη
Aγωγη του καταναλωτη
 
2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com2014 Volkl Ski Reviews by The-House.com
2014 Volkl Ski Reviews by The-House.com
 
Gamification
GamificationGamification
Gamification
 
شهادات تاهلية1
شهادات تاهلية1شهادات تاهلية1
شهادات تاهلية1
 
Avalanche Survival Infographic
Avalanche Survival InfographicAvalanche Survival Infographic
Avalanche Survival Infographic
 
Selection of Human Resource
Selection of Human ResourceSelection of Human Resource
Selection of Human Resource
 
May loc nuoc home pure
May loc nuoc home pureMay loc nuoc home pure
May loc nuoc home pure
 
National parks
National parks National parks
National parks
 
Mapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, EmbracingMapping your sense of place: Discovering, Understanding, Embracing
Mapping your sense of place: Discovering, Understanding, Embracing
 
Langdon Liz artwork portfolio
Langdon Liz  artwork portfolioLangdon Liz  artwork portfolio
Langdon Liz artwork portfolio
 
Progettare Media Education nelle scuole
Progettare Media Education nelle scuoleProgettare Media Education nelle scuole
Progettare Media Education nelle scuole
 
Bejeweled
BejeweledBejeweled
Bejeweled
 
e-learning
e-learninge-learning
e-learning
 
Aboutmydaughter
AboutmydaughterAboutmydaughter
Aboutmydaughter
 
Teori Belajar Brunner
Teori Belajar BrunnerTeori Belajar Brunner
Teori Belajar Brunner
 
Question 2
Question 2Question 2
Question 2
 
English Comp 1 Who am I?
English Comp 1 Who am I?English Comp 1 Who am I?
English Comp 1 Who am I?
 
Presentation2
Presentation2Presentation2
Presentation2
 

Similar to Data mining introduction

Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
yesheeka
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Hari Priya
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
Navjot Kaur
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
IJSCAI Journal
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
ijscai
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
ijscai
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
gerogepatton
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
Ram Sonawane
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
PreethaSuresh2
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
PrabhaJoshi4
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
Sitamarhi Institute of Technology
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
Sitamarhi Institute of Technology
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
COSTARCH Analytical Consulting (P) Ltd.
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing Research
Kelly Page
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
Uyoyo Edosio
 
Statistika dan Analisis Data
Statistika dan Analisis DataStatistika dan Analisis Data
Statistika dan Analisis Data
kisti purwitosari
 
Big Data Ethics
Big Data EthicsBig Data Ethics
Big Data Ethics
Nael Radwan
 

Similar to Data mining introduction (20)

Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
 
Final ppt sec.data.coll
Final ppt sec.data.collFinal ppt sec.data.coll
Final ppt sec.data.coll
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Big Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptxBig Data Analytics_Unit1.pptx
Big Data Analytics_Unit1.pptx
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Secondary Research in Applied Marketing Research
Secondary Research in Applied Marketing ResearchSecondary Research in Applied Marketing Research
Secondary Research in Applied Marketing Research
 
Big Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-CommerceBig Data Analytics and its Application in E-Commerce
Big Data Analytics and its Application in E-Commerce
 
Statistika dan Analisis Data
Statistika dan Analisis DataStatistika dan Analisis Data
Statistika dan Analisis Data
 
Big Data Ethics
Big Data EthicsBig Data Ethics
Big Data Ethics
 

More from Niyitegekabilly

Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
Niyitegekabilly
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
Niyitegekabilly
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
Niyitegekabilly
 
Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
Niyitegekabilly
 
JAVA PROGRAMMINGD
JAVA PROGRAMMINGDJAVA PROGRAMMINGD
JAVA PROGRAMMINGD
Niyitegekabilly
 
Birasa 1
Birasa 1Birasa 1
Birasa 1
Niyitegekabilly
 
JAVA PROGRAMMING
JAVA PROGRAMMING JAVA PROGRAMMING
JAVA PROGRAMMING
Niyitegekabilly
 

More from Niyitegekabilly (7)

Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Introduction to knowledge management
Introduction to knowledge managementIntroduction to knowledge management
Introduction to knowledge management
 
JAVA PROGRAMMINGD
JAVA PROGRAMMINGDJAVA PROGRAMMINGD
JAVA PROGRAMMINGD
 
Birasa 1
Birasa 1Birasa 1
Birasa 1
 
JAVA PROGRAMMING
JAVA PROGRAMMING JAVA PROGRAMMING
JAVA PROGRAMMING
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 

Data mining introduction

  • 2. intro Data mining is a powerful new technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers.
  • 3. Data collections in the real world    Ten largest transaction-processing databases range from 3 to 18 Terabytes Ten largest decision support databases range from 10 to 29 Terabytes Sizes have doubled / tripled between 2001 and end of 2003
  • 4. Questions arise    Is there any new, unexpected and potentially useful information contained in this data? Can we use historical data to predict future outcomes? (e.g. customer behavior, fraud detection, etc.)
  • 5. Some examples of data mining 1. Telecommunications Huge amount of data is collected daily  Transactional data (about each phone call)  Data on mobile phones, house based phones, Internet, etc.)  Other customer data (billing, personal information, etc.)  Additional data (network load, faults, etc.) Questions arises  Which customer group is highly profitable, which one is not?  To which customers should we advertise what kind of special offers?  What kind of call rates would increase profit without loosing good customers?  How do customer profiles change over time?  Fraud detection (stolen mobile phones or phone cards 
  • 6. Another 2. Health  Different aspects of the health system  Personal health records (at GPs, specialists, etc.)  Hospital data (e.g. admission data, midwives data, surgery data)  Billing information (Medicare, PBS) Questions  Are doctors following the procedures (e.g. prescription of medication)?  Adverse drug reactions (analysis of different data collections to find correlations)  Are people committing fraud (e.g. doctor shoppers)  Correlations between social and environmental issues and people's health?
  • 7. What is data mining?  Data Mining is the automated extraction of previously unrealized information from Large data sources for the purpose of supporting business actions.
  • 8. Some more definitions    Knowledge discovery in databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. An information extraction activity whose goal is to discover hidden facts contained in databases. Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data.
  • 10. Data mining process    Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals.
  • 11. Data mining process   Analyze the data by application software. Present the data in a useful format, such as a graph or table.
  • 12. DM is multi disciplinary
  • 13. What they do Detect patterns in data: Rules, patterns, classes, associations and functional dependencies, outliers, data distributions, clusters
  • 14. How they do it  Search through data and pattern space, non-parametric modelling, filtering, aggregation How well they do it Errors and biases, over-fitting, confounding effects, speed, scalability
  • 15. Challenges in DM    Data size  Size of data collections grows more than linear, doubling every 18 months  Scalable algorithms are needed  Data complexity Different types of data (free text, HTML, XML, multimedia) Dimensionality of the data increases (more attributes)
  • 16. Challenges contd..    The curse of dimensionality affects many algorithms (for example find nearest neighbors in high dimensions) Data quality  Real world data is messy and dirty (missing and out-of-date values, typographical errors, different coding/formats, etc.)
  • 17. Why mine data?       Data is being recorded Recorded data is being warehoused Computing power is affordable Competitive pressure is strong Commercial DM products are available It provides support for business decisions
  • 18. Value to business    Market segmentation - Identify the common characteristics of customers who buy the same products from your company. Customer churn - Predict which customers are likely to leave your company and go to a competitor. Fraud detection - Identify which transactions are most likely to be fraudulent.
  • 19. Value to business   Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing. Market basket analysis - Understand what products or services are commonly purchased together; e.g., beer and diapers.
  • 20. Value to business    Trend analysis - Reveal the difference between a typical customer this month and last. Data mining can also effectively deal with missing, inconsistent, and noisy data. Direct marketing - Identify which prospects should be included in a mailing list to obtain the highest response rate.