SlideShare a Scribd company logo
DATAMINING
Seval Ünver
E1900810 | CENG 553
Middle East Technical University
Computer Engineering Department
14.05.2013 CENG 553
In Summary
Outline
• Introduction
• Data vs. Information
• Who uses datamining?
• Common uses of datamining
• Datamining is…
• Supervised and Unsupervised Learning
• Predictive Models
• Datamining Process
• Some Popular Datamining Algorithms
• Data Warehouse
• Conceptual Modelling of Data Warehouse
• Example of Star Schema, Snowflake Schema, Fact Constellation
• Evolution of OLTP, OLAP and Data Warehouse
08.10.2013 Seval Ünver | CENG 553 2
Introduction
• Nowadays, large data sets have become available
due to advances in technology.
• As a result, there is an increasing interest in
various scientific communities to explore the use
of emerging data mining techniques for the
analysis of these large data sets *.
• Data mining is the semi-automatic discovery of
patterns, associations, changes, anomalies, and
statistically significant structures and events in
data **.
* Grossman et al., 2001
** Shmueli G, 2012
08.10.2013 Seval Ünver | CENG 553 3
What is Datamining?
• Process of semi-automatically analyzing large
databases to find patterns that are *:
– valid: hold on new data with some certainty
– novel: non-obvious to the system
– useful: should be possible to act on the item
– understandable: humans should be able to
interpret the pattern
• Also known as Knowledge Discovery in
Databases
08.10.2013 Seval Ünver | CENG 553 4
* Prof. S. Sudarshan CSE Dept, IIT Bombay
Big data: Cash Register
• Past: It was a
calculator.
• Now: It saves every
detail of every
action.
– The movements of
each product.
– The movements of
each user.
08.10.2013 Seval Ünver | CENG 553 5
Data vs. Information
• Data is useless by itself.
• Data is not just numbers
or letters. It consists of
numbers, letters and
their meaning. The
meaning is called
metadata.
• Information is
interpreted data.
• Converting the data to
information is called data
processing.
08.10.2013 Seval Ünver | CENG 553 6
Who uses Datamining?
• CapitalOne Bank
– future prediction
• Netflix (the largest DVD-by-mail rental company)
– Recommendation (you might also be interested in…)
• Amazon.com
– recommendation
• British law enforcement
– crime trends or security threats
• Facebook
– prediction how active a user will be after 3 months.
• Children's Hospital in Boston
– detecting domestic abuse
• Pandora (an Internet music radio)
– chooses the next song to play
08.10.2013 Seval Ünver | CENG 553 7
Common uses of Datamining:
• Direct mail marketing
• Web site personalization
• Credit card fraud detection
• Gas & jewelry
• Bioinformatics
• Text analysis
– SAS lie detector
• Market basket analysis
– Beer & baby diapers:
08.10.2013 Seval Ünver | CENG 553 8
Application Areas
08.10.2013 Seval Ünver | CENG 553 9
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
Datamining is…
08.10.2013 Seval Ünver | CENG 553 10
Datamining is not…
• Data warehousing
• SQL / Ad Hoc Queries / Reporting
• Software Agents
• Online Analytical Processing (OLAP)
• Data Visualization
08.10.2013 Seval Ünver | CENG 553 11
Supervised vs. Unsupervised Learning
• Supervised:
– Problem solving
– Driven by a real business problems and historical data
– Quality of results dependent on quality of data
• Unsupervised:
– Exploration (aka clustering)
– Relevance often an issue
• Beer and baby diapers
– Useful when trying to get an initial understanding of the data
– Non-obvious patterns can sometimes pop out of a completed
data analysis project
08.10.2013 Seval Ünver | CENG 553 12
Predictive Models
08.10.2013 Seval Ünver | CENG 553 13
Datamining Process
08.10.2013 Seval Ünver | CENG 553 14
Some Popular Data Mining Algorithms
Supervised
— Regression models
— Decision trees
— k-Nearest-Neighbor
— Neural networks
— Rule induction
Unsupervised
— K-means clustering
— Self organized map
08.10.2013 Seval Ünver | CENG 553 15
A very simple problem set
08.10.2013 Seval Ünver | CENG 553 16
Regression Models
08.10.2013 Seval Ünver | CENG 553 17
Regression Models
08.10.2013 Seval Ünver | CENG 553 18
Decision Trees
A series of nested if/then rules.
08.10.2013 Seval Ünver | CENG 553 19
Decision Tree Models
08.10.2013 Seval Ünver | CENG 553 20
K-Nearest Neighbor Algorithm
• Find nearest data point and do the same thing
as you did for that record.
08.10.2013 Seval Ünver | CENG 553 21
K-Nearest Neighbor Models
08.10.2013 Seval Ünver | CENG 553 22
Neural Networks
08.10.2013 Seval Ünver | CENG 553 23
• Set of nodes connected by directed weighted edges.
Neural Networks Models
08.10.2013 Seval Ünver | CENG 553 24
Neural Networks Models
08.10.2013 Seval Ünver | CENG 553 25
08.10.2013 Seval Ünver | CENG 553 26
· Pros
+ Can learn more complicated
class boundaries
+ Fast application
+ Can handle large number of
features
· Cons
- Slow training time
- Hard to interpret
- Hard to implement: trial
and error for choosing
number of nodes
Pros and Cons of Neural Networks
Supervised Algorithm Summary
• Decision Trees
– Understandable
– Relatively fast
– Easy to translate into SQL queries
• kNN
– Quick and easy
– Models tend to be very large
• Neural Networks
– Difficult to interpret
– Can require significant amounts of time to train
08.10.2013 Seval Ünver | CENG 553 27
K-Means Clustering
• User starts by specifying the number of clusters (K)
• K datapoints are randomly selected
• Repeat until no change:
– Hyperplanes separating K points are generated
– K Centroids of each cluster are computed
08.10.2013 Seval Ünver | CENG 553 28
Data Warehouse
Data warehouse is a database used for
reporting and data analysis.
08.10.2013 Seval Ünver | CENG 553 29
Data Mining works with Warehouse Data
08.10.2013 Seval Ünver | CENG 553 30
• Data Mining provides
the Enterprise with
intelligence
• Data Warehousing
provides the Enterprise
with a memory
Conceptual Modeling of Data Warehouses
• Modeling data warehouses: dimensions & measures
– Star schema: A fact table in the middle connected to a set of
dimension tables
– Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake
– Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called
galaxy schema or fact constellation
08.10.2013 Seval Ünver | CENG 553 31
Example of Star Schema
08.10.2013 32
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
state_or_province
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Seval Ünver | CENG 553
Example of Snowflake Schema
08.10.2013 33
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
state_or_province
country
city
Seval Ünver | CENG 553
Example of Fact Constellation
08.10.2013 34
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_state
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_key
shipper_type
shipper
Seval Ünver | CENG 553
Evolution of OLTP, OLAP and Data Warehouse
Time
08.10.2013 Seval Ünver | CENG 553 35
Evolutionary Step Business Question Enabling Technology
Data Collection
(1960s)
"What was my total revenue in the last
five years?"
computers, tapes, disks
Data Access
(1980s)
"What were unit sales in New England
last March?"
faster and cheaper
computers with more
storage, relational databases
Data Warehousing
And
Decision Support
"What were unit sales in New England
last March? Drill down to Boston."
faster and cheaper
computers with more
storage, On-line analytical
processing
(OLAP), multidimensional
databases,
data warehouses
Data Mining
"What's likely to happen to Boston
unit sales next month? Why?"
faster and cheaper
computers with more
storage, advanced computer
algorithms
08.10.2013 Seval Ünver | CENG 553 36
As a Result
• In order to apply data mining, a large amount of
quality data is required.
• The aim of datamining is acquiring rules and
equations which can be used to predict future.
• To be successful on such a work is dependent on
working with database experts and data mining
specialists. They need to work together.
• Work may take longer, you need time and
patience.
08.10.2013 Seval Ünver | CENG 553 37
Thank You
If you have question, you can contact with me
via email: e1900810@ceng.metu.edu.tr
Seval Ünver | METU CENG
08.10.2013 Seval Ünver | CENG 553 38

More Related Content

What's hot

Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
QuantUniversity
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
Sulman Ahmed
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
Data Blueprint
 
Market Segmentation and Market Basket Analysis
Market Segmentation and Market Basket AnalysisMarket Segmentation and Market Basket Analysis
Market Segmentation and Market Basket Analysis
Spotle.ai
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
Edureka!
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
Ike Ellis
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & OptimizationAmbareesh Kulkarni
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
Mohit Rajput
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
Chris Chen
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
Satyam Barsaiyan
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 

What's hot (20)

Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Market Segmentation and Market Basket Analysis
Market Segmentation and Market Basket AnalysisMarket Segmentation and Market Basket Analysis
Market Segmentation and Market Basket Analysis
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & Optimization
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 

Viewers also liked

Datamining
DataminingDatamining
Datamining
Yaman Çakmaklar
 
Data mining
Data miningData mining
Data mining
Akannsha Totewar
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
excel content
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
snoreen
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
Lovely Professional University
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
What Is Basecamp?
What Is Basecamp?What Is Basecamp?
What Is Basecamp?
Sevdanur Genc
 
Ethics In DW & DM
Ethics In DW & DMEthics In DW & DM
Ethics In DW & DM
abethan
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
guest0edcaf
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage Scheme
Venkatesh Devam ☁
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
Shitalkumar Sukhdeve
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
Sơn Còm Nhom
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& datamining
Paige Jaeger
 
Aloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSAloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRS
Milind Gokhale
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
Kartik Kalpande Patil
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
SHIVANI SONI
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
Diwas Kandel
 
Menaxhimi i Burimeve Njerëzore
Menaxhimi i Burimeve NjerëzoreMenaxhimi i Burimeve Njerëzore
Menaxhimi i Burimeve Njerëzoreedona krasniqi
 

Viewers also liked (20)

Datamining
DataminingDatamining
Datamining
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
What Is Basecamp?
What Is Basecamp?What Is Basecamp?
What Is Basecamp?
 
Ethics In DW & DM
Ethics In DW & DMEthics In DW & DM
Ethics In DW & DM
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage Scheme
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& datamining
 
Aloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSAloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRS
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Menaxhimi i Burimeve Njerëzore
Menaxhimi i Burimeve NjerëzoreMenaxhimi i Burimeve Njerëzore
Menaxhimi i Burimeve Njerëzore
 

Similar to What is Datamining? Which algorithms can be used for Datamining?

Sun modeling
Sun modelingSun modeling
Sun modeling
Andy Cobley
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
XanGwaps
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
Pratik Doshi
 
Data analytics and analysis trends in 2015 - Webinar
Data analytics and analysis trends in 2015 - WebinarData analytics and analysis trends in 2015 - Webinar
Data analytics and analysis trends in 2015 - Webinar
Ali Zeeshan
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
Become a Big Data Quality Hero
Become a Big Data Quality HeroBecome a Big Data Quality Hero
Become a Big Data Quality Hero
TechWell
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
MohammedAmeenUlIslam1
 
Couchbase 3.0.2 d1
Couchbase 3.0.2  d1Couchbase 3.0.2  d1
Couchbase 3.0.2 d1
Sachin Kumar Kansal
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
Manoj Mishra
 
Data science life cycle final
Data science life cycle finalData science life cycle final
Data science life cycle final
Manoj Mishra
 
The New Role of Data in the Changing Energy & Utilities Landscape
The New Role of Data in the Changing Energy & Utilities LandscapeThe New Role of Data in the Changing Energy & Utilities Landscape
The New Role of Data in the Changing Energy & Utilities Landscape
Denodo
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
Data science guide
Data science guideData science guide
Data science guide
gokulprasath06
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
Best Practices for Data at Scale - Global Data Science Conference
Best Practices for Data at Scale - Global Data Science ConferenceBest Practices for Data at Scale - Global Data Science Conference
Best Practices for Data at Scale - Global Data Science Conference
Carolyn Duby
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
Raja Chiky
 
Large Scale Modeling Overview
Large Scale Modeling OverviewLarge Scale Modeling Overview
Large Scale Modeling Overview
Ferris Jumah
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
Christalin Nelson
 

Similar to What is Datamining? Which algorithms can be used for Datamining? (20)

Sun modeling
Sun modelingSun modeling
Sun modeling
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
 
Data analytics and analysis trends in 2015 - Webinar
Data analytics and analysis trends in 2015 - WebinarData analytics and analysis trends in 2015 - Webinar
Data analytics and analysis trends in 2015 - Webinar
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Become a Big Data Quality Hero
Become a Big Data Quality HeroBecome a Big Data Quality Hero
Become a Big Data Quality Hero
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Couchbase 3.0.2 d1
Couchbase 3.0.2  d1Couchbase 3.0.2  d1
Couchbase 3.0.2 d1
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Data science life cycle final
Data science life cycle finalData science life cycle final
Data science life cycle final
 
The New Role of Data in the Changing Energy & Utilities Landscape
The New Role of Data in the Changing Energy & Utilities LandscapeThe New Role of Data in the Changing Energy & Utilities Landscape
The New Role of Data in the Changing Energy & Utilities Landscape
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
CV MG DU TOIT V0 8.8
CV MG DU TOIT V0 8.8CV MG DU TOIT V0 8.8
CV MG DU TOIT V0 8.8
 
Data science guide
Data science guideData science guide
Data science guide
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
 
Best Practices for Data at Scale - Global Data Science Conference
Best Practices for Data at Scale - Global Data Science ConferenceBest Practices for Data at Scale - Global Data Science Conference
Best Practices for Data at Scale - Global Data Science Conference
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Large Scale Modeling Overview
Large Scale Modeling OverviewLarge Scale Modeling Overview
Large Scale Modeling Overview
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 

More from Seval Çapraz

A Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval CaprazA Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval Capraz
Seval Çapraz
 
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneğiYapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Seval Çapraz
 
Etu Location
Etu LocationEtu Location
Etu Location
Seval Çapraz
 
Assembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search GerçekleştirimiAssembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search Gerçekleştirimi
Seval Çapraz
 
Zimbra zooms ahead with OneView
Zimbra zooms ahead with OneViewZimbra zooms ahead with OneView
Zimbra zooms ahead with OneView
Seval Çapraz
 
Software Project Management Plan
Software Project Management PlanSoftware Project Management Plan
Software Project Management Plan
Seval Çapraz
 
Distributed Computing Answers
Distributed Computing AnswersDistributed Computing Answers
Distributed Computing Answers
Seval Çapraz
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Seval Çapraz
 
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Seval Çapraz
 
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINESVARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
Seval Çapraz
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
Seval Çapraz
 
Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...
Seval Çapraz
 
A Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case StudyA Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case Study
Seval Çapraz
 
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on CudaComparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
Seval Çapraz
 
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
Seval Çapraz
 
Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)
Seval Çapraz
 
Optical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized LayersOptical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized Layers
Seval Çapraz
 
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval ÇaprazSpam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Seval Çapraz
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
Seval Çapraz
 
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Seval Çapraz
 

More from Seval Çapraz (20)

A Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval CaprazA Quick Start To Blockchain by Seval Capraz
A Quick Start To Blockchain by Seval Capraz
 
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneğiYapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
Yapay Sinir Ağları ile çiftler ticareti finansal tahmin pepsi cocacola örneği
 
Etu Location
Etu LocationEtu Location
Etu Location
 
Assembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search GerçekleştirimiAssembly Dili İle Binary Search Gerçekleştirimi
Assembly Dili İle Binary Search Gerçekleştirimi
 
Zimbra zooms ahead with OneView
Zimbra zooms ahead with OneViewZimbra zooms ahead with OneView
Zimbra zooms ahead with OneView
 
Software Project Management Plan
Software Project Management PlanSoftware Project Management Plan
Software Project Management Plan
 
Distributed Computing Answers
Distributed Computing AnswersDistributed Computing Answers
Distributed Computing Answers
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
Statistical Data Analysis on Diabetes 130-US hospitals for years 1999-2008 Da...
 
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINESVARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
VARIABILITY MANAGEMENT IN SOFTWARE PRODUCT LINES
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
 
Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...Importance of software quality assurance to prevent and reduce software failu...
Importance of software quality assurance to prevent and reduce software failu...
 
A Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case StudyA Document Management System in Defense Industry Case Study
A Document Management System in Defense Industry Case Study
 
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on CudaComparison of Parallel Algorithms For An Image Processing Problem on Cuda
Comparison of Parallel Algorithms For An Image Processing Problem on Cuda
 
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
GPU-Accelerated Route Planning of Multi-UAV Systems Using Simulated Annealing...
 
Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)Semantic Filtering (An Image Processing Method)
Semantic Filtering (An Image Processing Method)
 
Optical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized LayersOptical Flow with Semantic Segmentation and Localized Layers
Optical Flow with Semantic Segmentation and Localized Layers
 
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval ÇaprazSpam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
Spam Tanıma İçin Geliştirilmiş Güncel Yöntemlere Genel Bakış | Seval Çapraz
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
Bir Android Uygulamasında Bulunması Gereken Özellikler | Seval ZX | Android D...
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

What is Datamining? Which algorithms can be used for Datamining?

  • 1. DATAMINING Seval Ünver E1900810 | CENG 553 Middle East Technical University Computer Engineering Department 14.05.2013 CENG 553 In Summary
  • 2. Outline • Introduction • Data vs. Information • Who uses datamining? • Common uses of datamining • Datamining is… • Supervised and Unsupervised Learning • Predictive Models • Datamining Process • Some Popular Datamining Algorithms • Data Warehouse • Conceptual Modelling of Data Warehouse • Example of Star Schema, Snowflake Schema, Fact Constellation • Evolution of OLTP, OLAP and Data Warehouse 08.10.2013 Seval Ünver | CENG 553 2
  • 3. Introduction • Nowadays, large data sets have become available due to advances in technology. • As a result, there is an increasing interest in various scientific communities to explore the use of emerging data mining techniques for the analysis of these large data sets *. • Data mining is the semi-automatic discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in data **. * Grossman et al., 2001 ** Shmueli G, 2012 08.10.2013 Seval Ünver | CENG 553 3
  • 4. What is Datamining? • Process of semi-automatically analyzing large databases to find patterns that are *: – valid: hold on new data with some certainty – novel: non-obvious to the system – useful: should be possible to act on the item – understandable: humans should be able to interpret the pattern • Also known as Knowledge Discovery in Databases 08.10.2013 Seval Ünver | CENG 553 4 * Prof. S. Sudarshan CSE Dept, IIT Bombay
  • 5. Big data: Cash Register • Past: It was a calculator. • Now: It saves every detail of every action. – The movements of each product. – The movements of each user. 08.10.2013 Seval Ünver | CENG 553 5
  • 6. Data vs. Information • Data is useless by itself. • Data is not just numbers or letters. It consists of numbers, letters and their meaning. The meaning is called metadata. • Information is interpreted data. • Converting the data to information is called data processing. 08.10.2013 Seval Ünver | CENG 553 6
  • 7. Who uses Datamining? • CapitalOne Bank – future prediction • Netflix (the largest DVD-by-mail rental company) – Recommendation (you might also be interested in…) • Amazon.com – recommendation • British law enforcement – crime trends or security threats • Facebook – prediction how active a user will be after 3 months. • Children's Hospital in Boston – detecting domestic abuse • Pandora (an Internet music radio) – chooses the next song to play 08.10.2013 Seval Ünver | CENG 553 7
  • 8. Common uses of Datamining: • Direct mail marketing • Web site personalization • Credit card fraud detection • Gas & jewelry • Bioinformatics • Text analysis – SAS lie detector • Market basket analysis – Beer & baby diapers: 08.10.2013 Seval Ünver | CENG 553 8
  • 9. Application Areas 08.10.2013 Seval Ünver | CENG 553 9 Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 10. Datamining is… 08.10.2013 Seval Ünver | CENG 553 10
  • 11. Datamining is not… • Data warehousing • SQL / Ad Hoc Queries / Reporting • Software Agents • Online Analytical Processing (OLAP) • Data Visualization 08.10.2013 Seval Ünver | CENG 553 11
  • 12. Supervised vs. Unsupervised Learning • Supervised: – Problem solving – Driven by a real business problems and historical data – Quality of results dependent on quality of data • Unsupervised: – Exploration (aka clustering) – Relevance often an issue • Beer and baby diapers – Useful when trying to get an initial understanding of the data – Non-obvious patterns can sometimes pop out of a completed data analysis project 08.10.2013 Seval Ünver | CENG 553 12
  • 13. Predictive Models 08.10.2013 Seval Ünver | CENG 553 13
  • 14. Datamining Process 08.10.2013 Seval Ünver | CENG 553 14
  • 15. Some Popular Data Mining Algorithms Supervised — Regression models — Decision trees — k-Nearest-Neighbor — Neural networks — Rule induction Unsupervised — K-means clustering — Self organized map 08.10.2013 Seval Ünver | CENG 553 15
  • 16. A very simple problem set 08.10.2013 Seval Ünver | CENG 553 16
  • 17. Regression Models 08.10.2013 Seval Ünver | CENG 553 17
  • 18. Regression Models 08.10.2013 Seval Ünver | CENG 553 18
  • 19. Decision Trees A series of nested if/then rules. 08.10.2013 Seval Ünver | CENG 553 19
  • 20. Decision Tree Models 08.10.2013 Seval Ünver | CENG 553 20
  • 21. K-Nearest Neighbor Algorithm • Find nearest data point and do the same thing as you did for that record. 08.10.2013 Seval Ünver | CENG 553 21
  • 22. K-Nearest Neighbor Models 08.10.2013 Seval Ünver | CENG 553 22
  • 23. Neural Networks 08.10.2013 Seval Ünver | CENG 553 23 • Set of nodes connected by directed weighted edges.
  • 24. Neural Networks Models 08.10.2013 Seval Ünver | CENG 553 24
  • 25. Neural Networks Models 08.10.2013 Seval Ünver | CENG 553 25
  • 26. 08.10.2013 Seval Ünver | CENG 553 26 · Pros + Can learn more complicated class boundaries + Fast application + Can handle large number of features · Cons - Slow training time - Hard to interpret - Hard to implement: trial and error for choosing number of nodes Pros and Cons of Neural Networks
  • 27. Supervised Algorithm Summary • Decision Trees – Understandable – Relatively fast – Easy to translate into SQL queries • kNN – Quick and easy – Models tend to be very large • Neural Networks – Difficult to interpret – Can require significant amounts of time to train 08.10.2013 Seval Ünver | CENG 553 27
  • 28. K-Means Clustering • User starts by specifying the number of clusters (K) • K datapoints are randomly selected • Repeat until no change: – Hyperplanes separating K points are generated – K Centroids of each cluster are computed 08.10.2013 Seval Ünver | CENG 553 28
  • 29. Data Warehouse Data warehouse is a database used for reporting and data analysis. 08.10.2013 Seval Ünver | CENG 553 29
  • 30. Data Mining works with Warehouse Data 08.10.2013 Seval Ünver | CENG 553 30 • Data Mining provides the Enterprise with intelligence • Data Warehousing provides the Enterprise with a memory
  • 31. Conceptual Modeling of Data Warehouses • Modeling data warehouses: dimensions & measures – Star schema: A fact table in the middle connected to a set of dimension tables – Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake – Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation 08.10.2013 Seval Ünver | CENG 553 31
  • 32. Example of Star Schema 08.10.2013 32 time_key day day_of_the_week month quarter year time location_key street city state_or_province country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Seval Ünver | CENG 553
  • 33. Example of Snowflake Schema 08.10.2013 33 time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_key item branch_key branch_name branch_type branch supplier_key supplier_type supplier city_key city state_or_province country city Seval Ünver | CENG 553
  • 34. Example of Fact Constellation 08.10.2013 34 time_key day day_of_the_week month quarter year time location_key street city province_or_state country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper Seval Ünver | CENG 553
  • 35. Evolution of OLTP, OLAP and Data Warehouse Time 08.10.2013 Seval Ünver | CENG 553 35
  • 36. Evolutionary Step Business Question Enabling Technology Data Collection (1960s) "What was my total revenue in the last five years?" computers, tapes, disks Data Access (1980s) "What were unit sales in New England last March?" faster and cheaper computers with more storage, relational databases Data Warehousing And Decision Support "What were unit sales in New England last March? Drill down to Boston." faster and cheaper computers with more storage, On-line analytical processing (OLAP), multidimensional databases, data warehouses Data Mining "What's likely to happen to Boston unit sales next month? Why?" faster and cheaper computers with more storage, advanced computer algorithms 08.10.2013 Seval Ünver | CENG 553 36
  • 37. As a Result • In order to apply data mining, a large amount of quality data is required. • The aim of datamining is acquiring rules and equations which can be used to predict future. • To be successful on such a work is dependent on working with database experts and data mining specialists. They need to work together. • Work may take longer, you need time and patience. 08.10.2013 Seval Ünver | CENG 553 37
  • 38. Thank You If you have question, you can contact with me via email: e1900810@ceng.metu.edu.tr Seval Ünver | METU CENG 08.10.2013 Seval Ünver | CENG 553 38

Editor's Notes

  1. The US Government uses Data Mining to track fraudA Supermarket becomes an information brokerBasketball teams use it to track game strategyCross SellingTarget MarketingHolding on to Good CustomersWeeding out Bad Customers
  2. Regression: (linear or any other polynomial) a*x1 + b*x2 + c = Ci. Nearest neighourDecision tree classifier: divide decision space into piecewise constant regions.Probabilistic/generative modelsNeural networks: partition by non-linear boundaries
  3. Tree where internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels. Widely used learning methodEasy to interpret: can be re-represented as if-then-else rulesApproximates function by piece wise constant regionsDoes not require any prior knowledge of data distribution, works well on noisy data.Has been applied to: classify medical patients based on the disease, equipment malfunction by cause, loan applicant by likelihood of payment.
  4. Pros Reasonable training time Fast application Easy to interpret Easy to implement Can handle large number of featuresCons Cannot handle complicated relationship between features simple decision boundaries problems with lots of missing data
  5. Pros Fast trainingCons Slow during application. No feature selection. Notion of proximity vague
  6. Set of nodes connected by directed weighted edges.Useful for learning complex data like handwriting, speech and image recognition
  7. ProsCan learn more complicated class boundaries Fast application Can handle large number of featuresConsSlow training time Hard to interpret Hard to implement: trial and error for choosing number of nodes
  8. Data warehouse mining: assimilate data from operational sourcesmine static dataMining log dataContinuous mining: example in process controlStages in mining:data selection  pre-processing: cleaning  transformation  mining  result evaluation  visualization