Intro data mining lingkup
Upcoming SlideShare
Loading in...5
×
 

Intro data mining lingkup

on

  • 237 views

 

Statistics

Views

Total Views
237
Views on SlideShare
212
Embed Views
25

Actions

Likes
0
Downloads
5
Comments
0

2 Embeds 25

http://kanty-jingga.blogspot.com 24
http://www.facebook.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • One Midwest grocery chain used the data mining tool to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays. WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive Teradata data warehouse. WalMart allows more than 3,500 suppliers, to access data on their products These suppliers use this data to identify customer buying patterns at the store display level . They use this information to manage local store inventory and identify new merchandising opportunities. to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. By using this information a marketing manager can select only the customers who are most likely to respond.  The (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played can reveal that when player A played the Guard position, the opposite teams player B attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the team during that game.
  • DT algorithm has been successfully applied to a wide range of learning tasks from medical diagnosis to classifying equipment malfunction by their cause Simple to understand Works with data types
  • Decision trees classify instances by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute Example: This tree classifies Saturday mornings according to whether or not they are suitable for playing tennis  This family of algorithms infers decision trees by growing them from the root downward, greedily selecting the next best attribute for each new decision branch added to the tree. During the dop-down construction of the tree a decision to which attribute to put as a root or later to split on, needs to be made. In order to determine which attribute is the best classifier of the input instances, the algorithm uses statistical test called information gain. (Information gain of an attribute can be defined by measuring the expected reduction in entropy caused by partitioning the examples according to that attribute. ) How well a given attribute separates the training examples according to their target classification.
  • Decision trees classify instances by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute Example: This tree classifies Saturday mornings according to whether or not they are suitable for playing tennis  This family of algorithms infers decision trees by growing them from the root downward, greedily selecting the next best attribute for each new decision branch added to the tree. During the dop-down construction of the tree a decision to which attribute to put as a root or later to split on, needs to be made. In order to determine which attribute is the best classifier of the input instances, the algorithm uses statistical test called information gain. (Information gain of an attribute can be defined by measuring the expected reduction in entropy caused by partitioning the examples according to that attribute. ) How well a given attribute separates the training examples according to their target classification.

Intro data mining lingkup Intro data mining lingkup Presentation Transcript

  • Introduction to Data Mining Informatika1 Diambil dari © Copyright 2007, Natash
  • Outline  Motivation: Why Data Mining?  What is Data Mining?  Data Mining Applications  Issues in Data Mining2 Diambil dari © Copyrigh
  • Data vs. Information Society produces massive amounts of data  business, science, medicine, economics, sports, … Potentially valuable resource Raw data is useless  need techniques to automatically extract information  Data: recorded facts  Information: patterns underlying the data3 Diambil dari © Copyrigh
  • Multidisciplinary Field Database Statistics Technology Machine Learning Data Mining Visualization Artificial Intelligence Other (Machine Learning – Neural Network) Disciplines4 Diambil dari © Copyrigh
  • Terminology  Gold Mining  Knowledge mining from databases  Knowledge extraction  Data/pattern analysis  Knowledge Discovery Databases or KDD  Information harvesting  Business intelligence5 Diambil dari © Copyrigh
  • KDD Process DatabaseSelection Data Training Data Model,Transformation Preparation Data Mining Patterns Evaluation, Verification6 Diambil dari © Copyrigh
  • Data Mining Tasks  Exploratory Data Analysis  Predictive Modeling: Classification and Regression  Descriptive Modeling  Cluster analysis/segmentation  Discovering Patterns and Rules  Association/Dependency rules  Sequential patterns  Temporal sequences  Deviation detection7 Diambil dari © Copyrigh
  • Data Mining Tasks Concept/Class description: Characterization and discrimination  Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions Association (correlation and causality)  Multi-dimensional or single-dimensional association age(X, “20-29”) ^ income(X, “60-90K”)  buys(X, “TV”) 8 Diambil dari © Copyrigh
  • Data Mining Tasks Classification and Prediction  Finding models (functions) that describe and distinguish classes or concepts for future prediction  Example: classify countries based on climate, or classify cars based on gas mileage  Presentation:  If-THENrules, decision-tree, classification rule, neural network  Prediction: Predict some unknown or missing9 numerical values Diambil dari © Copyrigh
  • Data Mining Tasks  Cluster analysis  Class label is unknown: Group data to form new classes,  Example: cluster houses to find distribution patterns  Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity10 Diambil dari © Copyrigh
  • Data Mining Applications Science: Chemistry, Physics, Medicine  Biochemical analysis  Remote sensors on a satellite  Telescopes – star galaxy classification  Medical Image analysis11 Diambil dari © Copyrigh
  • Data Mining Applications Bioscience  Sequence-based analysis  Protein structure and function prediction  Protein family classification  Microarray gene expression12 Diambil dari © Copyrigh
  • Data Mining Applications  Pharmaceutical companies, Insurance and Health care, Medicine  Drug development  Identify successful medical therapies  Claims analysis, fraudulent behavior  Medical diagnostic tools  Predict office visits13 Diambil dari © Copyrigh
  • Data Mining Applications  Financial Industry, Banks, Businesses, E- commerce  Stock and investment analysis  Identify loyal customers vs. risky customer  Predict customer spending  Risk management  Sales forecasting14 Diambil dari © Copyrigh
  • Data Mining Applications Retail and Marketing  Customer buying patterns/demographic characteristics  Mailing campaigns  Market basket analysis  Trend analysis15 Diambil dari © Copyrigh
  • Data Mining Applications Database analysis and decision support  Market analysis and management  target marketing, customer relation management, market basket analysis, cross selling, market segmentation  Risk analysis and management  Forecasting, customer retention, improved underwriting, quality control, competitive analysis  Fraud detection and management16 Diambil dari © Copyrigh
  • Data Mining Applications  Sports and Entertainment  IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat  Astronomy  JPL and the Palomar Observatory discovered 22 quasars with the help of data mining17 Diambil dari © Copyrigh
  • DATA MINING EXAMPLES  Grocery store  NBA  Banking and Credit Card scoring  Fraud detection  Personalization & Customer Profiling  Campaign Management and Database Marketing18 Diambil dari © Copyrigh
  • Data Mining Challenges  Computationally expensive to investigate all possibilities  Dealing with noise/missing information and errors in data  Choosing appropriate attributes/input representation  Finding the minimal attribute space  Finding adequate evaluation function(s)  Extracting meaningful information 19 Not overfitting Diambil dari © Copyrigh
  • Summary Data mining: discovering interesting patterns from large amounts of data A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation20 Diambil dari © Copyrigh
  • Summary  Mining can be performed in a variety of information repositories  Data mining functionalities: characterization, association, classification, clustering, outlier and trend analysis, etc.  Classification of data mining systems  Major issues in data mining21 Diambil dari © Copyrigh
  • Kinds of Data Mining  Decision Tree Learning  Clustering  Neural Networks  Association Rules  Support Vector Machines  Genetic Algorithms  Nearest Neighbor Method22 Diambil dari © Copyrigh
  • DECISION TREE FOR THE CONCEPT “Play Tennis” Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14Mitchell, 1997 Rain Mild High Strong No 23 Diambil dari © Copyrigh
  • DECISION TREE FOR THE CONCEPT “Play Tennis” [Mitchell,1997]24 Diambil dari © Copyrigh