SlideShare a Scribd company logo
1 of 25
Download to read offline
DBM630: Data Mining and
                         Data Warehousing

                                   MS.IT. Rangsit University
                                                          Semester 2/2011
    by Kritsada Sriphaew (sriphaew.k AT gmail.com)

                                     Lecture 1
                               Introduction to
            Data Mining and Data Warehousing
               Text: Data Mining: Concepts and Techniques, By Jiawei Han
               and Micheline Kamber, Morgan Kaufmann Publishers (2006).

               ISBN: 978-1558609013

1
Administrative Matters
   Course Syllabus

   Lecture Notes & Assignments & Quizzes

   Course’s Communication
    Announcements, discussion, lecture notes, etc.
       Page: http://www.facebook.com/pages/Data-mining-MSIT-
        RSU/


2                               Data Mining and Data Warehousing by Kritsada Sriphaew
How we will be evaluated?
   Assessment Tasks
             Tasks                                   % Scores
             Quizzes (Approx. 2 times)               20
             Assignment                              20
             (Disscussion/Demonstration)
             Final                                   60



   To Pass
       At least 60% of the overall scores.

3                                        Data Mining and Data Warehousing by Kritsada Sriphaew
Text Books
   Mandatory Book
    Data Mining: Concepts and Techniques
     By Jiawei Han and Micheline Kamber
     Morgan Kaufmann Publishers (2006), Second Edition,
        ISBN-10: 1558609016, ISBN-13: 978-1558609013



   Supplementary Book
    Practical Machine Learning Tools and
    Techniques with JAVA Implementations
      By Ian H. Witten and Eibe Frank, Data Mining
      Morgan Kaufmann Publishers (2005), 2nd Edition
         ISBN-10: 0120884070, ISBN-13: 978-0120884070


4                                   Data Mining and Data Warehousing by Kritsada Sriphaew
Course Description (What we’LL learn?)
   Introduction to data warehousing. Characteristics of data warehousing, drawbacks
    and benefits of data warehousing, architecture of data warehousing, internal data
    structure for data warehousing, data integration, creating high quality data, data
    mart, online analytical processing (OLAP). Introduction to data mining, types of
    data for mining, architecture of typical data mining system, data preprocessing,
    association rule mining, classification and prediction, clustering, data mining
    applications, current trends in data mining, text mining, web mining, including
    tools for data mining analysis such as WEKA, SAS, etc.

                                                                                            ั
    แนวคิดเบืองต้นของคลังข้อมูล คุณลักษณะของคลังข้อมูล ข้อดีและข้อเสียของคลังข้อมูล สถาปตยกรรมของคลังข้อมูล
              ้
    โครงสร้างการจัดเก็บข้อมูลภายในคลังข้อมูล การบูรณาการข้อมูล การสร้างข้อมูลทีมคุณภาพ ดาต้ามาร์ท การ
                                                                                  ่ ี
    ประมวลผลออนไลน์เชิงวิเคราะห์ แนวคิดเบืองต้นการทาเหมืองข้อมูล ชนิดข้อมูลสาหรับการทาเหมืองข้อมูล
                                             ้
            ั
    สถาปตยกรรมของระบบเหมืองข้อมูล การเตรียมข้อมูล การขุดค้นกฎสัมพันธ์ การจาแนกประเภทและการทานาย การ
          ่                     ่ ี                                                   ั ั
    จัดกลุม การทาเหมืองข้อมูลทีมความซับซ้อน การประยุกต์ใช้เหมืองข้อมูล แนวโน้มปจจุบนการทาเหมืองข้อมูล เหมือง
    ข้อมูลตัวอักษร เหมืองข้อมูลเว็บ รวมถึงการใช้เครืองมือในการวิเคราะห์เหมืองข้อมูล เช่น WEKA, SAS เป็ นต้น
                                                   ่

    5                                               Data Mining and Data Warehousing by Kritsada Sriphaew
Course Schedule (tentative)
Week     Date                                    Topics
    1     8 JAN Introduction to Data Mining and Data Warehousing
    2    15 JAN Data Warehouse and OLAP Technology – I
    3    22 JAN Data Warehouse and OLAP Technology – II
    4    29 JAN Data Mining Concepts and Data Preparation
    5      5 FEB Association Rule Mining
    6     12 FEB Classification Model: Decision Tree, Classification Rules
    7     19 FEB Classification Model: Naïve Bayes
    8     26 FEB Prediction Model: Regression
    9     4 MAR Clustering
    10   11 MAR Data Mining Application: Text Mining, Web Mining, Social Network
                Analysis
    11   18 MAR Introduction to Data Mining Tool: WEKA
    12   25 MAR Tutorials
6                                      Final Mining and Data Warehousing by Kritsada Sriphaew
                                        Data
Prerequisites
 Basic Database Concepts
 Basic Statistics:
        Probability, Sampling, Logic, Linear Regression, …
    Algorithms:
        Basic Data Structures, Dynamic Programming, ...



We provide some backgrounds, but the class will be
fast pace if you have some basics in advance.
 7                                 Data Mining and Data Warehousing by Kritsada Sriphaew
Introduction
 Motivation: Why mine data?
 KDD: Knowledge Discovery in Databases
 What is Data Mining?
 Data Mining: on What kind of Data?
 Data Mining Tasks
 Data Mining Applications




8                        Data Mining and Data Warehousing by Kritsada Sriphaew
Evolution of Database Technology
       1960s:
         Data collection, database creation, IMS and network
          DBMS
       1970s:
         Relational data model, relational DBMS implementation
       1980s:
         RDBMS, advanced data models (extended-relational,
          OO, deductive, etc.) and application-oriented DBMS
          (spatial, scientific, engineering, etc.)
       1990s—2000s:
         Data mining and data warehousing, multimedia
          databases, and Web databases


    9                             Data Mining and Data Warehousing by Kritsada Sriphaew
Large Data Sets: A Motivation
 There is often information “hidden” in the data that
  is not readily evident.
 Human analysts take weeks to discover useful
  information.
 Much of the data is never been analyzed at all


      How do you explore millions of
      records, tens or hundreds of
      fields, and find patterns?



 10                                Data Mining and Data Warehousing by Kritsada Sriphaew
KDD Process
(Knowledge Discovery in Databases)
                                                   Interpretation/
                                                     Evaluation

                            Data Mining                                    Knowledge



                 Preprocessing
                                                Patterns


     Selection
                                 Preprocessed
                                     Data
     Data
                   Target
                    Data



                                      adapted from:
                                      U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An
                                      Overview,” Advances in Knowledge Discovery and Data Mining, U. Fayyad et
                                      al. (Eds.), AAAI/MIT Press

11                                      Data Mining and Data Warehousing by Kritsada Sriphaew
Knowledge Discovery




12                    Data Mining and Data Warehousing by Kritsada Sriphaew
Business Intelligence (BI) vs. Data Mining
    A word to call processes, techniques and tools that support
     business decision using information technology
         Increasing potential
         to support                                                              End User
         business decisions         Making Decisions

                                     Data Presentation                Business Analyst
                                 Visualization Techniques
                                      Data Mining
                                   Knowledge Discovery                      Data Analyst
                                        Data Exploration
                       Statistical Analysis, Querying and Reporting
                                Data Warehouses / Data Marts
                                             OLAP                                       DBA
                                        Data Sources
               Paper, Files, Information Providers, Database Systems, OLTP
    13                                        Data Mining and Data Warehousing by Kritsada Sriphaew
Terminology
   Data Mining
    A step in the knowledge discovery process consisting of
    particular algorithms (methods) that under some
    acceptable objective, produces a particular enumeration
    of patterns (models) over the data.

   Knowledge Discovery Process
    The process of using data mining methods (algorithms)
    to extract (identify) what is deemed knowledge according
    to the specifications of measures and thresholds, using a
    database along with any necessary preprocessing or
    transformations.
14                            Data Mining and Data Warehousing by Kritsada Sriphaew
Other definitions of Data Mining
 Non‐trivial extraction of implicit, previously unknown
  and useful information from data
 Automatic or semi-automatic process for analyzing
  large databases to find patterns that are:
       valid: hold on new data with some certainty
       novel: non‐obvious to the system
       useful: should be possible to act on the item
       understandable: humans should be able to interpret the
        pattern

 15                              Data Mining and Data Warehousing by Kritsada Sriphaew
Origins of Data Mining

                       Overlaps various fields, but
                        focus on
                           Scalability
                           Algorithm and Architecture
                           Automation to handle large
                            data




16                   Data Mining and Data Warehousing by Kritsada Sriphaew
Data Mining: on What kind of Data?
    Relational Databases
    Data Warehouses                                         Structure - 3D Anatomy

    Transactional Databases
    Advanced Database Systems
                                                                    Function – 1D Signal
        Object-Relational
        Spatial and Temporal
        Time-Series
                                                          Metadata – Annotation
        Multimedia                                                           GeneFilter Comparison Report

        Text                                             GeneFilter 1 Name:
                                                          O2#1 8-20-99adjfinal
                                                                              INTENSITIES
                                                                                                  GeneFilter 1
                                                                                                  N2#1finaladj
                                                                                                                 Name:




        Heterogeneous, Legacy, and Distributed           ORF NAME
                                                          YAL001C      TFC3 1
                                                                              RAW
                                                                       GENE NAME
                                                                                    NORMALIZED
                                                                                    CHRM F      G
                                                                                    1 A 1 2 12.03 7.38
                                                                                                      R          GF1
                                                                                                                 403.83
                                                                                                                        GF2



         WWW
                                                          YBL080C      PET112       2      1 A 1 3 53.21         35.62 "1,
                                                         YBR154C
                                                          YCL044C
                                                                       RPB5 2
                                                                              3
                                                                                    1 A 1 4 79.26 78.51
                                                                                    1 A 1 5 53.22 44.66
                                                                                                                 "2,660.73"
                                                                                                                 "1,786.53"
                                                          YDL020C      SON1 4       1 A 1 6 23.80 20.34          799.06
                                                          YDL211C             4     1 A 1 7 17.31 35.34          581.00
                                                          YDR155C      CPH1 4       1 A 1 8 349.78               401.84
                                                          YDR346C             4     1 A 1 9 64.97 65.88          "2,180.87"
                                                          YAL010C      MDM10 1      1 A 2 2 13.73 9.61           461.03
    17                                                    YBL088C      TEL1 2       1 A 2 3 8.50 7.74
                                    Data Mining and Data Warehousing by Kritsada Sriphaew
                                                          YBR162C             2     1 A 2 4 226.84
                                                                                                                 285.38
                                                                                                                 293.83
                                                          YCL052C      PBN1 3       1 A 2 5 41.28 34.79          "1,385.79"
                                                          YDL028C      MPS1 4       1 A 2 6 7.95 6.24            266.99
Data Mining Tasks
 Classification
 Clustering
 Association Rule Mining
 Sequential Pattern Discovery
 Regression
 Anomaly Detection
Ex: Classifying Galaxy




19                   Data Mining and Data Warehousing by Kritsada Sriphaew
Ex: Market Basket Analysis


                               ?   Where should detergents be placed in the
                                   Store to maximize their sales?



                               ?   Are window cleaning products purchased
                                   when detergents and orange juice are
                                   bought together?



                               ?   Is soda typically purchased with bananas?
                                   Does the brand of soda make a difference?




                               ?   How are the demographics of the
                                   neighborhood affecting what customers
                                   are buying?




20                  Data Mining and Data Warehousing by Kritsada Sriphaew
Ex: Anomaly Detection
   Detect significant deviations from normal behavior

   Applications:
       Credit Card Fraud Detection
       Network Intrusion Detection




21                                Data Mining and Data Warehousing by Kritsada Sriphaew
Some Success Stories
    Network intrusion detection using a combination of sequential
     rule discovery and classification tree on 4 GB DARPA data
        Won over (manual) knowledge engineering approach
        http://www.cs.columbia.edu/~sal/JAM/PROJECT/ provides good
         detailed description of the entire process
    Major US bank: Customer attrition prediction
        Segment customers based on financial behavior: 3 segments
        Build attrition models for each of the 3 segments
        40‐50% of attritions were predicted == factor of 18 increase
    Targeted credit marketing: major US banks
        find customer segments based on 13 months credit balances
        build another response model based on surveys
        increased response 4 times -- 2%
    22                                 Data Mining and Data Warehousing by Kritsada Sriphaew
How You’LL Benefit
 Confidently discuss the role and applicability of data
  warehousing and data mining to
  business/organization problems
 Get background knowledge for further explore to
  your thesis, independent study or your career’s
  projects since data mining methods (to extract
  knowledge from the data) are very useful for every
  fields.
Assignment
 Assignments will aim to test your detailed knowledge
  and understanding of the topics, as well as your
  critical thinking and research ability. Assignments may
  include tasks involving: writing detailed designs;
  reading research papers; learning and using specialist
  software/hardware.
 Assessment: the assignment will be worth 20% of the
  total course assessment.
PreTest
1. Select only one of the following items to fill in the blanks.
          (a) Characterization/Discrimination
          (b) Classification
          (c) Numeric Prediction
          (d) Clustering
          (e) Association Analysis
          (f) Trend Analysis
          Which function matches with the following task?
          ______(1) To estimate the price of the stock A in next month
          ______(2) To display a portion of sold products, according to their types.
          ______(3) To know which products are likely to be sold with which products
          ______(4) To group customers to a set of similar groups based on their features
          ______(5) To find the value of an experiment when a substance is tested.
          ______(6) To predict that a customer tends to be a good customer or not.

2.            Assume that we want to design a model to forecast tomorrow’s SET index,
              please suggest the detail of the model that we should construct and
              recommend the input and output to the model.
     25

More Related Content

What's hot

Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and miningRajesh Chandra
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDhilsath Fathima
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Miningethantelaviv
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 
Dw Concepts
Dw ConceptsDw Concepts
Dw Conceptsdataware
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
142230 633685297550892500
142230 633685297550892500142230 633685297550892500
142230 633685297550892500sumit621
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data WarehouseZalpa Rathod
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...y-asgari
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 

What's hot (20)

Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Datawarehouse olap olam
Datawarehouse olap olamDatawarehouse olap olam
Datawarehouse olap olam
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousing
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Ch03
Ch03Ch03
Ch03
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
2. olap warehouse
2. olap warehouse2. olap warehouse
2. olap warehouse
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Dw Concepts
Dw ConceptsDw Concepts
Dw Concepts
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
142230 633685297550892500
142230 633685297550892500142230 633685297550892500
142230 633685297550892500
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...Introducing to Datamining vs. OLAP -  مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
Introducing to Datamining vs. OLAP - مقدمه و مقایسه ای بر داده کاوی و تحلیل ...
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 

Viewers also liked (20)

Dbm630 lecture10
Dbm630 lecture10Dbm630 lecture10
Dbm630 lecture10
 
Dbm630 lecture07
Dbm630 lecture07Dbm630 lecture07
Dbm630 lecture07
 
Dbm630 lecture04
Dbm630 lecture04Dbm630 lecture04
Dbm630 lecture04
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Dbm630 lecture08
Dbm630 lecture08Dbm630 lecture08
Dbm630 lecture08
 
Dbm630 lecture05
Dbm630 lecture05Dbm630 lecture05
Dbm630 lecture05
 
Dbm630 lecture09
Dbm630 lecture09Dbm630 lecture09
Dbm630 lecture09
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 
Datacube
DatacubeDatacube
Datacube
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
References
References References
References
 
References - sql injection
References - sql injection References - sql injection
References - sql injection
 
Testing
TestingTesting
Testing
 
Data cubes
Data cubesData cubes
Data cubes
 
Oracle-Mengendalikan User
Oracle-Mengendalikan UserOracle-Mengendalikan User
Oracle-Mengendalikan User
 
MPLS
MPLSMPLS
MPLS
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 

Similar to Dbm630_lecture01

Similar to Dbm630_lecture01 (20)

Data mining
Data miningData mining
Data mining
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
 
isd314-01
isd314-01isd314-01
isd314-01
 
Graph
GraphGraph
Graph
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Data Mining
Data MiningData Mining
Data Mining
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 

More from Tokyo Institute of Technology (11)

Lecture 4 online and offline business model generation
Lecture 4 online and offline business model generationLecture 4 online and offline business model generation
Lecture 4 online and offline business model generation
 
Lecture 4: Brand Creation
Lecture 4: Brand CreationLecture 4: Brand Creation
Lecture 4: Brand Creation
 
Lecture3 ExperientialMarketing
Lecture3 ExperientialMarketingLecture3 ExperientialMarketing
Lecture3 ExperientialMarketing
 
Lecture3 Tools and Content Creation
Lecture3 Tools and Content CreationLecture3 Tools and Content Creation
Lecture3 Tools and Content Creation
 
Lecture2: Innovation Workshop
Lecture2: Innovation WorkshopLecture2: Innovation Workshop
Lecture2: Innovation Workshop
 
Lecture0: introduction Online Marketing
Lecture0: introduction Online MarketingLecture0: introduction Online Marketing
Lecture0: introduction Online Marketing
 
Lecture2: Marketing and Social Media
Lecture2: Marketing and Social MediaLecture2: Marketing and Social Media
Lecture2: Marketing and Social Media
 
Lecture1: E-Commerce Business Model
Lecture1: E-Commerce Business ModelLecture1: E-Commerce Business Model
Lecture1: E-Commerce Business Model
 
Lecture0: Introduction Social Commerce
Lecture0: Introduction Social CommerceLecture0: Introduction Social Commerce
Lecture0: Introduction Social Commerce
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
Coursesyllabus_dbm630
Coursesyllabus_dbm630Coursesyllabus_dbm630
Coursesyllabus_dbm630
 

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 

Dbm630_lecture01

  • 1. DBM630: Data Mining and Data Warehousing MS.IT. Rangsit University Semester 2/2011 by Kritsada Sriphaew (sriphaew.k AT gmail.com) Lecture 1 Introduction to Data Mining and Data Warehousing Text: Data Mining: Concepts and Techniques, By Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers (2006). ISBN: 978-1558609013 1
  • 2. Administrative Matters  Course Syllabus  Lecture Notes & Assignments & Quizzes  Course’s Communication Announcements, discussion, lecture notes, etc.  Page: http://www.facebook.com/pages/Data-mining-MSIT- RSU/ 2 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 3. How we will be evaluated?  Assessment Tasks Tasks % Scores Quizzes (Approx. 2 times) 20 Assignment 20 (Disscussion/Demonstration) Final 60  To Pass  At least 60% of the overall scores. 3 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 4. Text Books  Mandatory Book Data Mining: Concepts and Techniques By Jiawei Han and Micheline Kamber Morgan Kaufmann Publishers (2006), Second Edition, ISBN-10: 1558609016, ISBN-13: 978-1558609013  Supplementary Book Practical Machine Learning Tools and Techniques with JAVA Implementations By Ian H. Witten and Eibe Frank, Data Mining Morgan Kaufmann Publishers (2005), 2nd Edition ISBN-10: 0120884070, ISBN-13: 978-0120884070 4 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 5. Course Description (What we’LL learn?)  Introduction to data warehousing. Characteristics of data warehousing, drawbacks and benefits of data warehousing, architecture of data warehousing, internal data structure for data warehousing, data integration, creating high quality data, data mart, online analytical processing (OLAP). Introduction to data mining, types of data for mining, architecture of typical data mining system, data preprocessing, association rule mining, classification and prediction, clustering, data mining applications, current trends in data mining, text mining, web mining, including tools for data mining analysis such as WEKA, SAS, etc. ั แนวคิดเบืองต้นของคลังข้อมูล คุณลักษณะของคลังข้อมูล ข้อดีและข้อเสียของคลังข้อมูล สถาปตยกรรมของคลังข้อมูล ้ โครงสร้างการจัดเก็บข้อมูลภายในคลังข้อมูล การบูรณาการข้อมูล การสร้างข้อมูลทีมคุณภาพ ดาต้ามาร์ท การ ่ ี ประมวลผลออนไลน์เชิงวิเคราะห์ แนวคิดเบืองต้นการทาเหมืองข้อมูล ชนิดข้อมูลสาหรับการทาเหมืองข้อมูล ้ ั สถาปตยกรรมของระบบเหมืองข้อมูล การเตรียมข้อมูล การขุดค้นกฎสัมพันธ์ การจาแนกประเภทและการทานาย การ ่ ่ ี ั ั จัดกลุม การทาเหมืองข้อมูลทีมความซับซ้อน การประยุกต์ใช้เหมืองข้อมูล แนวโน้มปจจุบนการทาเหมืองข้อมูล เหมือง ข้อมูลตัวอักษร เหมืองข้อมูลเว็บ รวมถึงการใช้เครืองมือในการวิเคราะห์เหมืองข้อมูล เช่น WEKA, SAS เป็ นต้น ่ 5 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 6. Course Schedule (tentative) Week Date Topics 1 8 JAN Introduction to Data Mining and Data Warehousing 2 15 JAN Data Warehouse and OLAP Technology – I 3 22 JAN Data Warehouse and OLAP Technology – II 4 29 JAN Data Mining Concepts and Data Preparation 5 5 FEB Association Rule Mining 6 12 FEB Classification Model: Decision Tree, Classification Rules 7 19 FEB Classification Model: Naïve Bayes 8 26 FEB Prediction Model: Regression 9 4 MAR Clustering 10 11 MAR Data Mining Application: Text Mining, Web Mining, Social Network Analysis 11 18 MAR Introduction to Data Mining Tool: WEKA 12 25 MAR Tutorials 6 Final Mining and Data Warehousing by Kritsada Sriphaew Data
  • 7. Prerequisites  Basic Database Concepts  Basic Statistics:  Probability, Sampling, Logic, Linear Regression, …  Algorithms:  Basic Data Structures, Dynamic Programming, ... We provide some backgrounds, but the class will be fast pace if you have some basics in advance. 7 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 8. Introduction  Motivation: Why mine data?  KDD: Knowledge Discovery in Databases  What is Data Mining?  Data Mining: on What kind of Data?  Data Mining Tasks  Data Mining Applications 8 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 9. Evolution of Database Technology  1960s:  Data collection, database creation, IMS and network DBMS  1970s:  Relational data model, relational DBMS implementation  1980s:  RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)  1990s—2000s:  Data mining and data warehousing, multimedia databases, and Web databases 9 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 10. Large Data Sets: A Motivation  There is often information “hidden” in the data that is not readily evident.  Human analysts take weeks to discover useful information.  Much of the data is never been analyzed at all How do you explore millions of records, tens or hundreds of fields, and find patterns? 10 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 11. KDD Process (Knowledge Discovery in Databases) Interpretation/ Evaluation Data Mining Knowledge Preprocessing Patterns Selection Preprocessed Data Data Target Data adapted from: U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview,” Advances in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press 11 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 12. Knowledge Discovery 12 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 13. Business Intelligence (BI) vs. Data Mining  A word to call processes, techniques and tools that support business decision using information technology Increasing potential to support End User business decisions Making Decisions Data Presentation Business Analyst Visualization Techniques Data Mining Knowledge Discovery Data Analyst Data Exploration Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts OLAP DBA Data Sources Paper, Files, Information Providers, Database Systems, OLTP 13 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 14. Terminology  Data Mining A step in the knowledge discovery process consisting of particular algorithms (methods) that under some acceptable objective, produces a particular enumeration of patterns (models) over the data.  Knowledge Discovery Process The process of using data mining methods (algorithms) to extract (identify) what is deemed knowledge according to the specifications of measures and thresholds, using a database along with any necessary preprocessing or transformations. 14 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 15. Other definitions of Data Mining  Non‐trivial extraction of implicit, previously unknown and useful information from data  Automatic or semi-automatic process for analyzing large databases to find patterns that are:  valid: hold on new data with some certainty  novel: non‐obvious to the system  useful: should be possible to act on the item  understandable: humans should be able to interpret the pattern 15 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 16. Origins of Data Mining  Overlaps various fields, but focus on  Scalability  Algorithm and Architecture  Automation to handle large data 16 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 17. Data Mining: on What kind of Data?  Relational Databases  Data Warehouses Structure - 3D Anatomy  Transactional Databases  Advanced Database Systems Function – 1D Signal  Object-Relational  Spatial and Temporal  Time-Series Metadata – Annotation  Multimedia GeneFilter Comparison Report  Text GeneFilter 1 Name: O2#1 8-20-99adjfinal INTENSITIES GeneFilter 1 N2#1finaladj Name:  Heterogeneous, Legacy, and Distributed ORF NAME YAL001C TFC3 1 RAW GENE NAME NORMALIZED CHRM F G 1 A 1 2 12.03 7.38 R GF1 403.83 GF2 WWW YBL080C PET112 2 1 A 1 3 53.21 35.62 "1,  YBR154C YCL044C RPB5 2 3 1 A 1 4 79.26 78.51 1 A 1 5 53.22 44.66 "2,660.73" "1,786.53" YDL020C SON1 4 1 A 1 6 23.80 20.34 799.06 YDL211C 4 1 A 1 7 17.31 35.34 581.00 YDR155C CPH1 4 1 A 1 8 349.78 401.84 YDR346C 4 1 A 1 9 64.97 65.88 "2,180.87" YAL010C MDM10 1 1 A 2 2 13.73 9.61 461.03 17 YBL088C TEL1 2 1 A 2 3 8.50 7.74 Data Mining and Data Warehousing by Kritsada Sriphaew YBR162C 2 1 A 2 4 226.84 285.38 293.83 YCL052C PBN1 3 1 A 2 5 41.28 34.79 "1,385.79" YDL028C MPS1 4 1 A 2 6 7.95 6.24 266.99
  • 18. Data Mining Tasks  Classification  Clustering  Association Rule Mining  Sequential Pattern Discovery  Regression  Anomaly Detection
  • 19. Ex: Classifying Galaxy 19 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 20. Ex: Market Basket Analysis ? Where should detergents be placed in the Store to maximize their sales? ? Are window cleaning products purchased when detergents and orange juice are bought together? ? Is soda typically purchased with bananas? Does the brand of soda make a difference? ? How are the demographics of the neighborhood affecting what customers are buying? 20 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 21. Ex: Anomaly Detection  Detect significant deviations from normal behavior  Applications:  Credit Card Fraud Detection  Network Intrusion Detection 21 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 22. Some Success Stories  Network intrusion detection using a combination of sequential rule discovery and classification tree on 4 GB DARPA data  Won over (manual) knowledge engineering approach  http://www.cs.columbia.edu/~sal/JAM/PROJECT/ provides good detailed description of the entire process  Major US bank: Customer attrition prediction  Segment customers based on financial behavior: 3 segments  Build attrition models for each of the 3 segments  40‐50% of attritions were predicted == factor of 18 increase  Targeted credit marketing: major US banks  find customer segments based on 13 months credit balances  build another response model based on surveys  increased response 4 times -- 2% 22 Data Mining and Data Warehousing by Kritsada Sriphaew
  • 23. How You’LL Benefit  Confidently discuss the role and applicability of data warehousing and data mining to business/organization problems  Get background knowledge for further explore to your thesis, independent study or your career’s projects since data mining methods (to extract knowledge from the data) are very useful for every fields.
  • 24. Assignment  Assignments will aim to test your detailed knowledge and understanding of the topics, as well as your critical thinking and research ability. Assignments may include tasks involving: writing detailed designs; reading research papers; learning and using specialist software/hardware.  Assessment: the assignment will be worth 20% of the total course assessment.
  • 25. PreTest 1. Select only one of the following items to fill in the blanks. (a) Characterization/Discrimination (b) Classification (c) Numeric Prediction (d) Clustering (e) Association Analysis (f) Trend Analysis Which function matches with the following task? ______(1) To estimate the price of the stock A in next month ______(2) To display a portion of sold products, according to their types. ______(3) To know which products are likely to be sold with which products ______(4) To group customers to a set of similar groups based on their features ______(5) To find the value of an experiment when a substance is tested. ______(6) To predict that a customer tends to be a good customer or not. 2. Assume that we want to design a model to forecast tomorrow’s SET index, please suggest the detail of the model that we should construct and recommend the input and output to the model. 25