Data Mining-Current Status and Research Directions

3,084 views

Published on

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,084
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
168
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Data Mining-Current Status and Research Directions

  1. 1. Data Mining: Current Status and Research Directions <ul><li>Jiawei Han </li></ul><ul><li>Intelligent Database Systems Research Lab </li></ul><ul><li>School of Computing Science </li></ul><ul><li>Simon Fraser University, Canada </li></ul><ul><li>http://www.cs.sfu.ca/~han </li></ul>
  2. 2. Outline <ul><li>Why is data mining hot? </li></ul><ul><li>Current status: Major technical progress </li></ul><ul><li>Is data mining flying high, or not? </li></ul><ul><li>How to fly data mining high?—Research directions on data mining </li></ul>
  3. 3. Why Is Data Mining Hot? <ul><li>Data mining ( knowledge discovery in databases ) </li></ul><ul><ul><li>Extraction of interesting ( non-trivial, implicit , previously unknown and potentially useful) information (knowledge) or patterns from data in large databases or other information repositories </li></ul></ul><ul><li>Necessity is the mother of invention </li></ul><ul><ul><li>Data is everywhere—data mining should be everywhere, too! </li></ul></ul><ul><ul><li>Understand and use data—an imminent task! </li></ul></ul>
  4. 4. Data, Data, Everywhere!! <ul><li>Relational database—A commodity of every enterprise </li></ul><ul><li>Huge data warehouses are under construction </li></ul><ul><li>POS (Point of Sales): Transactional DBs in terabytes </li></ul><ul><li>Object-relational databases, distributed, heterogeneous, and legacy databases </li></ul><ul><li>Spatial databases (GIS), remote sensing database (EOS), and scientific/engineering databases </li></ul><ul><li>Time-series data (e.g., stock trading) and temporal data </li></ul><ul><li>Text (documents, emails) and multimedia databases </li></ul><ul><li>WWW: A huge, hyper-linked, dynamic, global information system </li></ul>
  5. 5. Data Mining Is Everywhere, too! — A Multi-Dimensional View of Data Mining <ul><li>Databases to be mined </li></ul><ul><ul><li>Relational, transactional, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc. </li></ul></ul><ul><li>Knowledge to be mined </li></ul><ul><ul><li>Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analysis, etc. </li></ul></ul><ul><li>Techniques utilized </li></ul><ul><ul><li>Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. </li></ul></ul><ul><li>Applications adapted </li></ul><ul><ul><li>Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, Weblog analysis, etc. </li></ul></ul>
  6. 6. Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Other Disciplines Information Science Machine Learning (AI) Visualization
  7. 7. Data Mining—One Can Trace Back to Early Civilization <ul><li>Most scientific discoveries involve “data mining” </li></ul><ul><ul><li>Kepler’s Law, Newton’s Laws, periodic table of chemical elements, …, from “big bang” to DNA </li></ul></ul><ul><li>Statistics: A discipline dedicated to data analysis </li></ul><ul><li>Then why data mining? What are the differences? </li></ul><ul><ul><li>Huge amount of data—in giga to tera bytes </li></ul></ul><ul><ul><li>Fast computer—quick response, interactive analysis </li></ul></ul><ul><ul><li>Multi-dimensional, powerful, thorough analysis </li></ul></ul><ul><ul><li>High-level, “declarative”—user’s ease and control </li></ul></ul><ul><ul><li>Automated or semi-automated—mining functions hidden or built-in in many systems </li></ul></ul>
  8. 8. A Brief History of Data Mining Activities <ul><li>1989 IJCAI Workshop on Knowledge Discovery in Databases </li></ul><ul><ul><li>Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991) </li></ul></ul><ul><li>1991-1994 Workshops on Knowledge Discovery in Databases </li></ul><ul><ul><li>Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996) </li></ul></ul><ul><li>1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD’95-98) </li></ul><ul><ul><li>Journal of Data Mining and Knowledge Discovery (1997) </li></ul></ul><ul><li>1998 ACM SIGKDD, SIGKDD’1999-2001 conferences, and SIGKDD Explorations </li></ul><ul><li>More conferences on data mining </li></ul><ul><ul><li>PAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM, DaWaK, SPIE-DM, etc. </li></ul></ul>
  9. 9. Research Progress in the Last Decade <ul><li>Multi-dimensional data analysis: Data warehouse and OLAP (on-line analytical processing) </li></ul><ul><li>Association, correlation, and causality analysis </li></ul><ul><li>Classification: scalability and new approaches </li></ul><ul><li>Clustering and outlier analysis </li></ul><ul><li>Sequential patterns and time-series analysis </li></ul><ul><li>Similarity analysis: curves, trends, images, texts, etc. </li></ul><ul><li>Text mining, Web mining and Weblog analysis </li></ul><ul><li>Spatial, multimedia, scientific data analysis </li></ul><ul><li>Data preprocessing and database compression </li></ul><ul><li>Data visualization and visual data mining </li></ul><ul><li>Many others, e.g., collaborative filtering </li></ul>
  10. 10. Multi-Dimensional Data Analysis <ul><li>Data warehousing: integration from heterogeneous or semi-structured databases </li></ul><ul><li>Multi-dimensional modeling of data: star & snowflake schemas </li></ul><ul><li>Efficient and scalable computation of data cubes or iceberg cubes </li></ul><ul><li>OLAP (on-line analytical processing): drilling, dicing, slicing, etc. </li></ul><ul><li>Discovery-driven exploration of data cubes </li></ul><ul><li>From OLAP to OLAM: A multi-dimensional view for on-line analytical mining </li></ul>
  11. 11. Association and Frequent Pattern Analysis <ul><li>Efficient mining of frequent patterns and association rules: </li></ul><ul><ul><li>Apriori and FP-growth algorithms </li></ul></ul><ul><ul><li>Multi-level, multi-dimensional, quantitative association mining </li></ul></ul><ul><li>From association to correlation, sequential patterns, partial periodicity, cyclic rules, ratio rules, etc. </li></ul><ul><li>Query and constraint-based association analysis </li></ul>
  12. 12. Classification: Scalable Methods and Handling of Complex Types of Data <ul><li>Classification has been an essential theme in machine learning, and statistics research </li></ul><ul><ul><li>Decision trees, Bayesian classification, neural networks, k-nearest neighbors, etc. </li></ul></ul><ul><ul><li>Tree-pruning, Boosting, bagging techniques </li></ul></ul><ul><li>Efficient and scalable classification methods </li></ul><ul><ul><li>Exploration of attribute-class pairs </li></ul></ul><ul><ul><li>SLIQ, SPRINT, RainForest, BOAT, etc. </li></ul></ul><ul><li>Classification of semi-structured and non-structured data </li></ul><ul><ul><li>Classification by clustering association rules (ARCS) </li></ul></ul><ul><ul><li>Association-based classification </li></ul></ul><ul><ul><li>Web document classification </li></ul></ul>
  13. 13. Clustering and Outlier Analysis <ul><li>Partitioning methods </li></ul><ul><ul><li>k-means, k-medoids, CLARANS </li></ul></ul><ul><li>Hierarchical methods: micro-clusters </li></ul><ul><ul><li>Birch, Cure, Chameleon </li></ul></ul><ul><li>Density-based methods: </li></ul><ul><ul><li>DBSCAN and OPTICS, DENCLU </li></ul></ul><ul><li>Grid-based methods </li></ul><ul><ul><li>STING, CLIQUE, WaveCluster </li></ul></ul><ul><li>Outlier analysis: </li></ul><ul><ul><li>statistics-based, distance-based, deviation-based </li></ul></ul><ul><li>Constraint-based clustering </li></ul><ul><ul><li>COD (Clustering with Obstructed Distance) </li></ul></ul><ul><ul><li>User-specified constraints </li></ul></ul>
  14. 14. Sequential Patterns and Time-Series Analysis <ul><li>Trend analysis </li></ul><ul><ul><li>Trend movement vs. cyclic variations, seasonal variations and random fluctuations </li></ul></ul><ul><li>Similarity search in time-series database </li></ul><ul><ul><li>Handling gaps, scaling, etc. </li></ul></ul><ul><ul><li>Indexing methods and query languages for time-series </li></ul></ul><ul><li>Sequential pattern mining </li></ul><ul><ul><li>Various kinds of sequences, various methods </li></ul></ul><ul><ul><li>From GSP to PrefixSpan </li></ul></ul><ul><li>Periodicity analysis </li></ul><ul><ul><li>Full periodicity, partial periodicity, cyclic association rules </li></ul></ul>
  15. 15. Similarity Search: Similar Curves, Trends, Images, and Texts <ul><li>Various kinds of data, various similarity mining methods </li></ul><ul><li>Discovery of similar trends in time-series data </li></ul><ul><ul><li>Data transformation & high-dimensional structures </li></ul></ul><ul><li>Finding similar images based on color, texture, etc. </li></ul><ul><ul><li>Content-based vs. keyword-based retrieval </li></ul></ul><ul><ul><li>Color histogram-based signature </li></ul></ul><ul><ul><li>Multi-feature composed signature </li></ul></ul><ul><li>Finding documents with similar texts </li></ul><ul><ul><li>Similar keywords (synonymy & polysemy) </li></ul></ul><ul><ul><li>Term frequency matrix </li></ul></ul><ul><ul><li>Latent semantic indexing </li></ul></ul>
  16. 16. Spatial, Multimedia, Scientific Data Analysis <ul><li>Multi-dimensional analysis of spatial, multimedia and scientific data </li></ul><ul><ul><li>Geo-spatial data cube and spatial OLAP </li></ul></ul><ul><ul><li>The curse of dimensionality problem </li></ul></ul><ul><li>Association analysis </li></ul><ul><ul><li>A progressive refinement methodology </li></ul></ul><ul><ul><li>Micro-clustering can be used for preprocessing in the analysis of complex types of data </li></ul></ul><ul><li>Classification </li></ul><ul><ul><li>Association-based for handling high-dimensionality and sparse data </li></ul></ul>
  17. 17. Data Mining Industry and Applications <ul><li>From research prototypes to data mining products, languages, and standards </li></ul><ul><ul><li>IBM Intelligent Miner, SAS Enterprise Miner, SGI MineSet, Clementine, MS/SQLServer 2000, DBMiner, BlueMartini, MineIt, DigiMine, etc. </li></ul></ul><ul><ul><li>A few data mining languages and standards (esp. MS OLEDB for Data Mining). </li></ul></ul><ul><li>Application achievements in many domains </li></ul><ul><ul><li>Market analysis, trend analysis, fraud detection, outlier analysis, Web mining, etc. </li></ul></ul>
  18. 18. Is Data Mining Flying? Or Not?? <ul><li>Data mining is flying </li></ul><ul><ul><li>R & D have been striding forward greatly </li></ul></ul><ul><ul><li>Applications have been broadened substantially </li></ul></ul><ul><li>But not as high as some may have hoped. Why not? </li></ul><ul><ul><li>Hope to see billions of $’s within years? </li></ul></ul><ul><ul><ul><li>A young and coming technology, not a hype! </li></ul></ul></ul><ul><ul><li>Not bread-and-butter but value-added service </li></ul></ul><ul><ul><ul><li>DBMS, WWW, and other information systems will still be a “data mining” aircraft-carrier </li></ul></ul></ul><ul><ul><li>Not on-the-shelf in nature </li></ul></ul><ul><ul><ul><li>Need training, understanding, and customizing (re-develop.) </li></ul></ul></ul><ul><ul><li>Young technology—need much R&D to fly high </li></ul></ul><ul><ul><ul><li>Much research, development, and real problem solving! </li></ul></ul></ul>
  19. 19. How to Fly Data Mining High?—Research Directions <ul><li>Web mining </li></ul><ul><li>Towards integrated data mining environments and tools </li></ul><ul><ul><li>“ Vertical” (or application-specific) data mining </li></ul></ul><ul><ul><li>Invisible data mining </li></ul></ul><ul><li>Towards intelligent, efficient, and scalable data mining methods </li></ul>
  20. 20. Web Mining: A Fast Expanding Frontier in Data Mining <ul><li>Mine what Web search engine finds </li></ul><ul><li>Automatic classification of Web documents </li></ul><ul><li>Discovery of authoritative Web pages, Web structures and Web communities </li></ul><ul><li>Meta-Web Warehousing: Web yellow page service </li></ul><ul><li>Web usage mining </li></ul>
  21. 21. Mine What Web Search Engine Finds <ul><li>Current Web search engines: A convenient source for mining </li></ul><ul><ul><li>keyword-based, return too many, often low quality answers, still missing a lot, not customized, etc. </li></ul></ul><ul><li>Data mining will help: </li></ul><ul><ul><li>coverage: “Enlarge and then shrink,” using synonyms and conceptual hierarchies </li></ul></ul><ul><ul><li>better search primitives: user preferences/hints </li></ul></ul><ul><ul><li>linkage analysis: authoritative pages and clusters </li></ul></ul><ul><ul><li>Web-based languages: XML + WebSQL + WebML </li></ul></ul><ul><ul><li>customization: home page + Weblog + user profiles </li></ul></ul>
  22. 22. Discovery of Authoritative Pages in WWW <ul><li>Page-rank method ( Brin and Page, 1998): </li></ul><ul><ul><li>Rank the &quot;importance&quot; of Web pages, based on a model of a &quot;random browser.&quot; </li></ul></ul><ul><li>Hub/authority method (Kleinberg, 1998): </li></ul><ul><ul><li>Prominent authorities often do not endorse one another directly on the Web. </li></ul></ul><ul><ul><li>Hub pages have a large number of links to many relevant authorities. </li></ul></ul><ul><ul><li>Thus hubs and authorities exhibit a mutually reinforcing relationship: </li></ul></ul><ul><li>Both the page-rank and hub/authority methodologies have been shown to provide qualitatively good search results for broad query topics on the WWW. </li></ul>
  23. 23. Automatic Classification of Web Documents <ul><li>Web document classification: </li></ul><ul><ul><li>Good human classification: Yahoo!, CS term hierarchies </li></ul></ul><ul><ul><li>These classifications can be used as training sets to build up learning model </li></ul></ul><ul><li>Key-word based classification is different from multi-dimensional classification </li></ul><ul><ul><li>Association or clustering-based classification is often more effective </li></ul></ul><ul><ul><li>Multi-level classification is important </li></ul></ul>
  24. 24. A Multiple Layered Meta-Web Architecture Generalized Descriptions More Generalized Descriptions Layer 0 Layer 1 Layer n ...
  25. 25. Web Yellow Page Service: A Multi-Layer, Meta-Web Approach <ul><li>XML: facilitates structured and meta-information extraction </li></ul><ul><li>Automatic classification of Web documents: </li></ul><ul><ul><li>based on Yahoo!, etc. as training set + keyword-based correlation/classification analysis (IR/AI assistance) </li></ul></ul><ul><li>Automatic ranking of important Web pages </li></ul><ul><ul><li>authoritative site recognition and clustering Web pages </li></ul></ul><ul><li>Generalization-based multi-layer meta-Web construction </li></ul><ul><ul><li>With the assistance of clustering and classification analysis </li></ul></ul><ul><li>Meta-Web can be warehoused and incrementally updated </li></ul><ul><li>Querying and mining can be performed on or assisted by meta-Web </li></ul>
  26. 26. Importance of Constructing Multi-Layer Meta Web <ul><li>Benefits of Multi-Layer Meta-Web: </li></ul><ul><ul><li>Multi-dimensional Web info summary analysis </li></ul></ul><ul><ul><li>Approximate and intelligent query answering </li></ul></ul><ul><ul><li>Web high-level query answering (WebSQL, WebML) </li></ul></ul><ul><ul><li>Web content and structure mining </li></ul></ul><ul><ul><li>Observing the dynamics/evolution of the Web </li></ul></ul><ul><li>Is it realistic to construct such a meta-Web? </li></ul><ul><ul><li>It benefits even if it is partially constructed </li></ul></ul><ul><ul><li>The benefit may justify the cost of tool development, standardization, and partial restructuring </li></ul></ul>
  27. 27. Web Usage (Click-Stream) Mining <ul><li>Weblog provides rich information about Web dynamics </li></ul><ul><li>Multidimensional Weblog analysis: </li></ul><ul><ul><li>disclose potential customers, users, markets, etc. </li></ul></ul><ul><li>Plan mining (mining general Web accessing regularities): </li></ul><ul><ul><li>Web linkage adjustment, performance improvements </li></ul></ul><ul><li>Web accessing association/sequential pattern analysis: </li></ul><ul><ul><li>Web cashing, prefetching, swapping </li></ul></ul><ul><li>Trend analysis: </li></ul><ul><ul><li>Dynamics of the Web: what has been changing? </li></ul></ul><ul><li>Customized to individual users </li></ul>
  28. 28. Towards Integrated Data Mining Environments and Tools <ul><li>OLAP Mining: Integration of Data Warehousing and Data Mining </li></ul><ul><li>Querying and Mining: An Integrated Information Analysis Environment </li></ul><ul><li>Basic Mining Operations and Mining Query Optimization </li></ul><ul><li>“ Vertical” (or application-specific) data mining </li></ul><ul><li>Invisible data mining </li></ul>
  29. 29. OLAP Mining: An Integration of Data Mining and Data Warehousing <ul><li>Data mining systems, DBMS, Data warehouse systems coupling </li></ul><ul><ul><li>No coupling, loose-coupling, semi-tight-coupling, tight-coupling </li></ul></ul><ul><li>On-line analytical mining data </li></ul><ul><ul><li>integration of mining and OLAP technologies </li></ul></ul><ul><li>Interactive mining multi-level knowledge </li></ul><ul><ul><li>Necessity of mining knowledge and patterns at different levels of abstraction by drilling/rolling, pivoting, slicing/dicing, etc. </li></ul></ul><ul><li>Integration of multiple mining functions </li></ul><ul><ul><li>Characterized classification, first clustering and then association </li></ul></ul>
  30. 30. An OLAM Architecture Data Warehouse Meta Data MDDB OLAM Engine OLAP Engine User GUI API Data Cube API Database API Data cleaning Data integration Layer3 OLAP/OLAM Layer2 MDDB Layer1 Data Repository Layer4 User Interface Filtering&Integration Filtering Databases Mining query Mining result
  31. 31. Querying and Mining: An Integrated Information Analysis Environment <ul><li>Data mining as a component of DBMS, data warehouse, or Web information system </li></ul><ul><ul><li>Integrated information processing environment </li></ul></ul><ul><ul><ul><li>MS/SQLServer-2000 (Analysis service) </li></ul></ul></ul><ul><ul><ul><li>IBM IntelligentMiner on DB2 </li></ul></ul></ul><ul><ul><ul><li>SAS EnterpriseMiner: data warehousing + mining </li></ul></ul></ul><ul><li>Query-based mining </li></ul><ul><ul><li>Querying database/DW/Web knowledge </li></ul></ul><ul><ul><li>Efficiency and flexibility: preprocessing, on-line processing, optimization, integration, etc. </li></ul></ul>
  32. 32. Basic Mining Operations and Mining Query Optimization <ul><li>Relational databases: There are a set of basic relational operations and a standard query language, SQL </li></ul><ul><ul><li>E.g., selection, projection, join, set difference, intersection, Cartesian product, etc. </li></ul></ul><ul><li>Are there a set of standard data mining operations, on which optimizations can be done? </li></ul><ul><ul><li>Difficulty: different definitions on operations </li></ul></ul><ul><ul><li>Importance: optimization can be performed on them systematically, standardization to facilitate information exchange and system interoperability </li></ul></ul>
  33. 33. “ Vertical” Data Mining <ul><li>Generic data mining tools? —Too simple to match domain-specific, sophisticated applications </li></ul><ul><ul><li>Expert knowledge and business logic represent many years of work in their own fields! </li></ul></ul><ul><ul><li>Data mining + business logic + domain experts </li></ul></ul><ul><li>A multi-dimensional view of data miners </li></ul><ul><ul><li>Complexity of data: Web, sequence, spatial, multimedia, … </li></ul></ul><ul><ul><li>Complexity of domains: DNA, astronomy, market, telecom, … </li></ul></ul><ul><li>Domain-specific data mining tools </li></ul><ul><ul><li>Provide concrete, killer solution to specific problems </li></ul></ul><ul><ul><li>Feedback to build more powerful tools </li></ul></ul>
  34. 34. Invisible Data Mining <ul><li>Build mining functions into daily information services </li></ul><ul><ul><li>Web search engine (link analysis, authoritative pages, user profiles)—adaptive web sites, etc. </li></ul></ul><ul><ul><li>Improvement of query processing: history + data </li></ul></ul><ul><ul><li>Making service smart and efficient </li></ul></ul><ul><li>Benefits from/to data mining research </li></ul><ul><ul><li>Data mining research has produced many scalable, efficient, novel mining solutions </li></ul></ul><ul><ul><li>Applications feed new challenge problems to research </li></ul></ul>
  35. 35. Towards Intelligent Tools for Data Mining <ul><li>Integration paves the way to intelligent mining </li></ul><ul><li>Smart interface brings intelligence </li></ul><ul><ul><li>Easy to use, understand and manipulate </li></ul></ul><ul><li>One picture may worth 1,000 words </li></ul><ul><ul><li>Visual and audio data mining </li></ul></ul><ul><li>Human-Centered Data Mining </li></ul><ul><li>Towards self-tuning, self-managing, self-triggering data mining </li></ul>
  36. 36. Integrated Mining: A Booster for Intelligent Mining <ul><li>Integration paves the way to intelligent mining </li></ul><ul><ul><li>Data mining integrates with DBMS, DW, WebDB, etc </li></ul></ul><ul><ul><li>Integration inherits the power of up-to-date information technology: querying, MD analysis, similarity search, etc. </li></ul></ul><ul><ul><li>Mining can be viewed as querying database knowledge </li></ul></ul><ul><li>Integration leads to standard interface/language, function/process standardization, utility, and reachability </li></ul><ul><li>Efficiency and scalability bring intelligent mining to reality </li></ul>
  37. 37. One Picture May Worth 1000 Words! <ul><li>Visual Data Mining </li></ul><ul><ul><li>Visualization of data </li></ul></ul><ul><ul><li>Visualization of data mining results </li></ul></ul><ul><ul><li>Visualization of data mining processes </li></ul></ul><ul><ul><li>Interactive data mining: visual classification </li></ul></ul><ul><li>One melody may worth 1000 words too! </li></ul><ul><ul><li>Audio data mining: turn data into music and melody! </li></ul></ul><ul><ul><li>Uses audio signals to indicate the patterns of data or the features of data mining results </li></ul></ul>
  38. 38. Visualization of data mining results in SAS Enterprise Miner: scatter plots
  39. 39. Visualization of association rules in MineSet 3.0
  40. 40. Visualization of a decision tree in MineSet 3.0
  41. 41. Visualization of Data Mining Processes by Clementine
  42. 42. Interactive Visual Mining by Perception-Based Classification (PBC)
  43. 43. Human-Centered Data Mining <ul><li>Finding all the patterns autonomously in a database? — unrealistic because the patterns could be too many but uninteresting </li></ul><ul><li>Data mining should be an interactive process </li></ul><ul><ul><li>User directs what to be mined </li></ul></ul><ul><li>Users must be provided with a set of primitives to be used to communicate with the data mining system — using a data mining query language </li></ul><ul><li>User should provide constraints on what to be mined </li></ul><ul><li>System should use such constraints to guide the mining process (constraint-based mining or mining query optimization) </li></ul>
  44. 44. Constraint-Based Mining <ul><li>What kinds of constraints can be used in mining? </li></ul><ul><ul><li>Knowledge type constraint : classification, association, etc. </li></ul></ul><ul><ul><li>Data constraint : SQL-like queries </li></ul></ul><ul><ul><ul><li>Find products sold together in Vancouver in Feb.’01 . </li></ul></ul></ul><ul><ul><li>Dimension/level constraints: </li></ul></ul><ul><ul><ul><li>in relevance to region, price, brand, customer category . </li></ul></ul></ul><ul><ul><li>Rule constraints: </li></ul></ul><ul><ul><ul><li>small sales (price < $10) triggers big sales (sum > $200). </li></ul></ul></ul><ul><ul><li>Interestingness constraints: </li></ul></ul><ul><ul><ul><li>E.g., strong rules (min_support  3%, min_confidence  60%, min_lift > 3.0). </li></ul></ul></ul>
  45. 45. Rule Constraints: A Classification Succinctness Anti-monotonicity Monotonicity Convertible constraints Inconvertible constraints
  46. 46. Constraint-Based Clustering Analysis <ul><li>User-specified constraints: no cluster has less than 1000 gold customers </li></ul><ul><li>Resource allocation (clustering) with obstacles </li></ul>
  47. 47. Towards Automated Data Mining? <ul><li>It is not realistic to automatically find all the knowledge in a large database </li></ul><ul><li>Thus we promote human-centered, constraint-based mining </li></ul><ul><li>However, to achieve genuine intelligent data mining, data mining process should be self-tuning, self-managing, self-triggering </li></ul><ul><li>Functions should be developed to achieve such performance </li></ul>
  48. 48. Conclusions <ul><li>Data mining—A promising research frontier </li></ul><ul><li>Data mining research has been striding forward greatly in the last decade </li></ul><ul><li>However, data mining, as an industry, has not been flying as high as expected </li></ul><ul><li>Much research and application exploration are needed </li></ul><ul><ul><li>Web mining </li></ul></ul><ul><ul><li>Towards integrated data mining environments and tools </li></ul></ul><ul><ul><li>Towards intelligent, efficient, and scalable data mining methods </li></ul></ul>
  49. 49. http://www.cs.sfu.ca/~han http://db.cs.sfu.ca <ul><li>Thank you !!! </li></ul>
  50. 50. References <ul><li>J. Han and M. Kamber, Data Mining: Concepts and Techniques , Morgan Kaufmann, 2001. </li></ul><ul><li>J. Han, L. V. S. Lakshmanan, and R. T. Ng, &quot;Constraint-Based, Multidimensional Data Mining&quot;, COMPUTER (special issues on Data Mining), 32(8): 46-50, 1999. </li></ul>

×