Data Mining Overview


Published on

Overall View about data mining

Published in: Education
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Mining Overview

  1. 1. 27/Sep/2008 Data Mining July 16, 2009 1
  2. 2. Evolution of Database technology YEAR PURPOSE 1960’s Network Model, Batch Reports 1970’s Relational data model, Executive information Systems 1980’s Application specific DBMS(spatial data, scientific data, image data, …) 1990’s Terabyte Data warehouses, Object Oriented, middleware and web technology 2000’s Business Process 2010’s Sensor DB systems, DBs on embedded systems, large scale pub/ sub systems Data Mining July 16, 2009 2
  3. 3. Motivation : Necessity is the mother of invention  Data explosion problem ◦ Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories  We are drowning in data, but starving for knowledge!  Solution: Data warehousing and data mining ◦ Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases Data Mining July 16, 2009 3
  4. 4. Why Data Mining?  Data, Data, Data Every where …  I can’t find data I need – data is scattered over network  I can’t get the data I need  I can’t understand the data I need  I can’t use the data I found Data Mining July 16, 2009 4
  5. 5.  An abundance of data  This data occupies  Super Market Scanners, POS data  Terabytes - 10^12 bytes  Credit cards transactions  Call Center records  Petabytes - 10^15 bytes  ATM Machines  Demographic data  Exabytes - 10^18bytes  Sensor Networks  Cameras  Zettabytes - 10^21bytes  Web server logs  Customer web site trails  Zottabytes-10^24bytes  Geographic Information System  National Medical Records  Walmart - 24 Terabytes  Weather Images Data Mining July 16, 2009 5
  6. 6.  Process of sorting through large amounts of data and picking out relevant information  Process of analyzing data from different perspectives and summarizing it into useful information  Discovering hidden value in database  It is non-trivial process of identifying valid, novel, useful and understandable patterns in data  Extracting or mining knowledge from large amounts of data Data Mining July 16, 2009 6
  7. 7. History Notes – Many Names of Data Mining YEAR Names USES 1960 Data Fishing, Data Statisticians Dredging 1990 Data Mining DB Community, business 1989 Knowledge Discovery AI, Machine Learning community in databases Other Names Data Archaeology, Information Harvesting, Information Discovery, Knowledge Extraction, Data Mining July 16, 2009 7
  8. 8. Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence July 16, 2009 Data Mining 8
  9. 9. Why Data Mining?(Cont..)  Data Warehouse is single, complete and consistent store of data from variety of different sources available to end users  For example, AT and T handles billions of calls per day. Europe's Very Long Baseline Interferometer (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session  We need data mining for  Transforming data into useful information to users  Present data in useful format  Provide data access to business analyst, Information technology professionals Data Mining July 16, 2009 9
  10. 10. Data Mining Process  Data Mining is the technique used to carry out KDD.  Data Mining turns data into information and then to knowledge Information Data Knowledge Data Mining July 16, 2009 10
  11. 11. Steps in Data Mining 1. Data cleaning To remove noise and inconsistent data 2. Data integration To integrate (compile) multiple data sources 3. Data selection Data relevant to analysis is selected 4. Data transformation Summary normalization aggregation operations are performed (convert data into two dimension form) and consolidate the data Data Mining July 16, 2009 11
  12. 12. Steps in Data Mining(Cont..) 5. Data mining Intelligent methods are applied to the data to discover knowledge or patterns 6. Pattern evaluation Evaluation of the interesting patterns by thresholding 7. Knowledge Discovery Visualization and presentation methods are used to present the mined knowledge to the user. Data Mining July 16, 2009 12
  13. 13. Pattern Evaluation ◦ Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Data Selection Warehouse Data Cleaning Data Integration Databases Data Mining July 16, 2009 13
  14. 14. Data Mining Tasks 1. Classification • Classification maps data into predefined groups or classes. • It may be represented by methods such as decision trees, etc. Decision tree  Flow chart like tree structure  Each node denotes test of an attribute value  Each branch represents outcome of test  Leaves represent classes or class distribution. Data Mining July 16, 2009 14
  15. 15. 2. Regression Used to map a data item to a real valued prediction variable. Example. A manager wants to reach a certain level of savings before his retirement. Periodically he predicts his retirement savings by current value and several past values. He uses a simple linear regressive formula to predict the values of savings in future. 3. Prediction Many real world applications can be seen predicting future data states based on past and current data. Example - Predicting flooding is difficult problem Data Mining July 16, 2009 15
  16. 16. 4. Clustering Clustering is similar to classification except that the groups are not predefined. 5. Association Rule Association refers to uncovering relationship 1998 among data. Used in retail sales community to identify the items Bread and (products) that are frequently Jam sell Zzzz... purchased together. together! Data Mining July 16, 2009 16
  17. 17. 6. Summarization Summarization of general characteristics or features of target class of data. Data characterization presented in various forms - pie charts, bar charts, curves. Data discrimination comparison of general features of target class of data objects with general features of objects from one or a set of contrasting classes. 7. Outlier Analysis Database may contain data objects that do not comply with general behavior model of data. These data objects are called as outliers. Data mining methods discard outliers as noise or exceptions. In applications such as fraud detection, rare events may be more interesting than regularly occurring events. Data Mining July 16, 2009 17
  18. 18. Data Mining: Types of Data  Relational data and transactional data  Text  Images, video  Mixtures of data Data Mining July 16, 2009 18
  19. 19. Data Mining Products  DataMind -- neurOagent  Information Discovery -- IDIS  SAS Institute -- SAS/Neuronets 19 Data Mining July 16, 2009
  20. 20. Data Mining Software  RapidMiner and Weka – Defining data mining process  Top 8 data mining software in 2008  Angoss software  Infor CRM Epiphany  Portrait Software  SAS  SPSS  ThinkAnalytics  Unica  Viscovery Data Mining July 16, 2009 20
  21. 21. Application Areas Industry Application Finance Credit Card Analysis Insurance Fraud Analysis Telecommunication Call record analysis July 16, 2009 Data Mining 21
  22. 22. Applications  Financial Industry, Banks, Businesses, E-commerce ◦ Stock and investment analysis ◦ Identify loyal customers and risky customer ◦ Predict customer spending  Database analysis and decision support ◦ Market analysis and management  target marketing, customer relation management, market basket analysis. ◦ Risk analysis and management  Forecasting, quality control, competitive analysis ◦ Fraud detection and management Data Mining July 16, 2009 22
  23. 23. Data Mining in Usage 1. Intelligent Miner  It is IBM data mining product  Distinct feature is include scalability of its mining algorithm and tight integration with IBM DB2 related data base system. 5. DB Miner  Developed by DBMiner Technologies Inc.  Distinct features of DBMiner are Data cube based Online Analytical Mining Data Mining July 16, 2009 23
  24. 24. The Telecomm Slice Product Household Telecomm o ns e gi R Video Europe Far East Audio India Retail Direct Special Sales Channel Data Mining July 16, 2009 24
  25. 25. Conclusion  Data mining: discovering interesting patterns from large amounts of data  A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation  Mining can be performed in a variety of information repositories  Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier etc Data Mining July 16, 2009 25
  26. 26. Thank you !!! Data Mining July 16, 2009 26