DATA MINING TRENDS AND DEVELOPMENTS : The Key Data Mining ...
Upcoming SlideShare
Loading in...5
×
 

DATA MINING TRENDS AND DEVELOPMENTS : The Key Data Mining ...

on

  • 953 views

 

Statistics

Views

Total Views
953
Views on SlideShare
953
Embed Views
0

Actions

Likes
0
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    DATA MINING TRENDS AND DEVELOPMENTS : The Key Data Mining ... DATA MINING TRENDS AND DEVELOPMENTS : The Key Data Mining ... Document Transcript

    • DATA MINING TRENDS AND DEVELOPMENTS : The Key Data Mining Technologies and Applications for the 21st Century Jeffrey Hsu Information Systems, Fairleigh Dickinson University Madison, NJ 07940, USA jeff@fdu.edu Abstract This paper discusses a number of technologies, approaches, and research areas which have been identified as having critical and future promise in the field of data mining. There is currently an explosion in the amount of data which we now produce and have access to, and mining from these sources can uncover important information. The extensive use of handheld, wireless, and other ubiquitous devices is a developing area, since a lot of information being created and transmitted would be maintained and stored only on these kinds of devices. Among the other areas which are being developed, investigated, and applications identified for include hypertext and hypermedia data mining, phenomenal data mining, distributed/collective data mining, constraint-based data mining, and other related methods. Keywords: Data mining; distributed, ubiquitous, hypertext, multimedia, time-series, constraint-based, phenomenal 1. THE PRESENT AND THE FUTURE As mentioned previously, the field of data mining is The field of data mining and knowledge discovery in very broad, and there are many methods and databases (KDD) has been growing in leaps and bounds, technologies which have become dominant in the field. and has shown great potential for the future (Han and Not only have there been developments in the Kamber, 2001). The purpose of this paper is to survey “traditional” areas of data mining, there are other areas many of the future trends in the field of data mining, which have been identified as being especially important with a focus on those which are thought to have the as future trends in the field. These are the subject of the most promise and applicability to future data mining upcoming sections and form the major emphasis of this applications. paper. What is the future of data mining? Certainly, the field 2. MAJOR TRENDS IN TECHNOLOGIES AND has made great strides in past years, and many industry METHODS analysts and experts in the area feel that the future will be bright. There is definite growth in the area of data There are a number of data mining trends is in terms of mining. Many industry analysts and research firms have technologies and methodologies which are currently projected a bright future for the entire data mining/KDD being developed and researched. These trends include area, and its related area of CRM (customer relationship methods for analyzing more complex forms of data, as management). According to IDC, spending in the area of well as specific techniques and methods. The trends business intelligence, which encompasses data mining, identified include distributed data mining, is estimated to increase from $3.6 billion in 2000 to hypertext/hypermedia mining, ubiquitous data mining, $11.9 billion in 2005. The growth in the CRM Analytic as well as multimedia, spatial, and time series/sequential application market is expected to approach 54.1% per data mining. These are examined in detail in the year through 2003. In addition, data mining projects are upcoming sections. expected to grow by more than 300% by the year 2002. By 2003, over 90% of consumer-based industries with e-commerce orientation will utilize some kind of data mining model.
    • 3. DATA MINING TRENDS IN objectives of UDM is to mine data while minimizing the TECHNOLOGIES AND METHODS cost of ubiquitous presence. DISTRIBUTED/COLLECTIVE DATA MINING Human-computer interaction is another challenging One area of data mining which is attracting a good aspect of UDM. Visualizing patterns like classifiers, amount of attention is that of distributed and collective clusters, associations and others, in portable devices are data mining. Much of the data mining which is being usually difficult. The small display areas offer serious done currently focuses on a database or data warehouse challenges to interactive data mining environments. Data of information which is physically located in one place. management in a mobile environment is also a However, the situation arises where information may be challenging issue. Moreover, the sociological and located in different places, in different physical psychological aspects of the integration between data locations. This is known generally as distributed data mining technology and our lifestyle are yet to be mining (DDM). Therefore, the goal is to effectively explored. The key issues to consider include theories of mine distributed data which is located in heterogeneous UDM, advanced algorithms for mobile and distributed sites. Examples of this include biological information applications, data management issues, mark-up located in different databases, data which comes from languages, and other data representation techniques; the databases of two different firms, or analysis of data integration with database applications for mobile from different branches of a corporation, the combining environments, architectural issues: (architecture, control, of which would be an expensive and time-consuming security, and communication issues), specialized mobile process. devices for UDM, software agents and UDM (Agent based approaches in UDM, agent interaction--- Distributed data mining (DDM) is used to offer a cooperation, collaboration, negotiation, organizational different approach to traditional approaches analysis, by behavior), applications of UDM (Application in using a combination of localized data analysis, together business, science, engineering, medicine, and other with a “global data model. ” In more specific terms, this disciplines), location management issues in UDM and is specified as: technology for web-based applications of UDM -performing local data analysis for generating partial (Kargupta and Joshi, 2001). data models, and HYPERTEXT AND HYPERMEDIA DATA -combining the local data models from different data MINING sites in order to develop the global model. Hypertext and hypermedia data mining can be characterized as mining data which includes text, This global model combines the results of the separate hyperlinks, text markups, and various other forms of analyses. Often the global model produced, especially if hypermedia information. As such, it is closely related to the data in different locations has different features or both web mining, and multimedia mining, which are characteristics, may become incorrect or ambiguous. covered separately in this section, but in reality are quite This problem is especially critical when the data in close in terms of content and applications. While the distributed sites is heterogeneous rather than World Wide Web is substantially composed of hypertext homogeneous. These heterogeneous data sets are known and hypermedia elements, there are other kinds of as vertically partitioned datasets. hypertext/hypermedia data sources which are not found on the web. Examples of these include the information An approach proposed by Kargupta et al (2000) speaks found in online catalogues, digital libraries, online of the collective data mining (CDM) approach, which information databases, and the like. In addition to the provides a better approach to vertically partitioned traditional forms of hypertext and hypermedia, together datasets, using the notion of orthonormal basis with the associated hyperlink structures, there are also functions, and computes the basis coefficients to inter-document structures which exist on the web, such generate the global model of the data (Kargupta et. al., as the directories employed by such services as Yahoo! 2000). (www.yahoo.com) or the Open Directory project (http://dmoz.org) These taxonomies of topics and UBIQUITOUS DATA MINING (UDM) subtopics are linked together to form a large network or The advent of laptops, palmtops, cell phones, and hierarchical tree of topics and associated links and wearable computers is making ubiquitous access to large pages. quantity of data possible. Advanced analysis of data for extracting useful knowledge is the next natural step in Some of the important data mining techniques used for the world of ubiquitous computing. hypertext and hypermedia data mining include classification (supervised learning), clustering Accessing and analyzing data from a ubiquitous (unsupervised learning), semi-structured learning, and computing device offer many challenges. For example, social network analysis. UDM introduces additional cost due to communication, computation, security, and other factors. So one of the
    • In the case of classification, or supervised learning, the Multimedia information, because its nature as a large process starts off by reviewing training data in which collection of multimedia objects, must be represented items are marked as being part of a certain class or differently from conventional forms of data. One group. This data is the basis from which the algorithm is approach is to create a multimedia data cube which can trained. One application of classification is in the area be used to convert multimedia-type data into a form of web topic directories, which can group similar- which is suited to analysis using one of the main data sounding or spelled terms into appropriate categories, so mining techniques, but taking into account the unique that searches will not bring up inappropriate sites and characteristics of the data. This may include the use of pages. The use of classification can also result in measures and dimensions for texture, shape, color, and searches which are not only based on keywords, but also related attributes. In essence, it is possible to create a on category and classification attributes. Methods used multidimensional spatial database. Among the types of for classification include naive Bayes classification, analyses which can be conducted on multimedia parameter smoothing, dependence modeling, and databases include associations, clustering, classification, maximum entropy (Chakrabarti, 2000). and similarity search. Unsupervised learning, or clustering, differs from Another developing area in multimedia data mining is classification in that classification involved the use of that of audio data mining (mining music). The idea is training data, clustering is concerned with the creation of basically to use audio signals to indicate the patterns of hierarchies of documents based on similarity, and data or to represent the features of data mining results. organize the documents based on that hierarchy. The basic advantage of audio data mining is that while Intuitively, this would result in more similar documents using a technique such as visual data mining may being placed on the leaf levels of the hierarchy, with less disclose interesting patterns from observing graphical similar sets of document areas being placed higher up, displays, it does require users to concentrate on closer to the root of the tree. Techniques which have watching patterns, which can become monotonous. But been used for unsupervised learning include k-means when representing it as a stream of audio, it is possible clustering, agglomerative clustering, random to transform patterns into sound and music and listen to projections, and latent semantic indexing. pitches, rhythms, tune, and melody in order to identify anything interesting or unusual. Its is possible not only Semi-supervised learning and social network analysis to summarize melodies, based on the approximate are other methods which are important to hypermedia- patterns that repeatedly occur in the segment, but also to based data mining. Semi-supervised learning is the case summarize style, based on tone, tempo, or the major where there are both labeled and unlabeled documents, musical instruments played (Zaiane, Han, and Zhu, and there is a need to learn from both types of 2000; Han and Kamber, 2001). documents. Social network analysis is also applicable because the web is considered a social network, which SPATIAL AND GEOGRAPHIC DATA MINING examines networks formed through collaborative The data types which come to mind when the term data association, whether it be between friends, academics mining is mentioned involves data as we know it— doing research or service on committees, and between statistical, generally numerical data of varying kinds. papers through references and citations. Graph distances However, it is also important to consider information and various aspects of connectivity come into play when which is of an entirely different kind—spatial and working in the area of social networks (Larson, 1996; geographic data which could contain information about Mizruchi et. al., 1986). Other research conducted in the astronomical data, natural resources, or even orbiting area of hypertext data mining include work on satellites and spacecraft which transmit images of earth distributed hypertext resource discovery (Chakrabarti, from out in space. Much of this data is image-oriented, van Berg and Dom, 1999). and can represent a great deal of information if properly analyzed and mined (Miller and Han, 2001) . MULTIMEDIA DATA MINING Multimedia Data Mining is the mining and analysis of A definition of spatial data mining is as follows: “the various types of data, including images, video, audio, extraction of implicit knowledge, spatial relationships, and animation. The idea of mining data which contains or other patterns not explicitly stored in spatial different kinds of information is the main objective of databases.” Some of the components of spatial data multimedia data mining (Zaiane et. al., 1998). As which differentiate it from other kinds include distance multimedia data mining incorporates the areas of text and topological information, which can be indexed using mining, as well as hypertext/hypermedia mining, these multidimensional structures, and required special spatial fields are closely related. Much of the information data access methods, together with spatial knowledge describing these other areas also apply to multimedia representation and data access methods, along with the data mining. This field is also rather new, but holds ability to handle geometric calculations. much promise for the future. Analyzing spatial and geographic data include such tasks as understanding and browsing spatial data,
    • uncovering relationships between spatial data items (and Other techniques which can be used on these kinds of also between non-spatial and spatial items), and also data include similarity search, sequential pattern mining, analysis using spatial databases and spatial knowledge and periodicity analysis. Similarity search is concerned bases. The applications of these would be useful in such with the identification of a pattern sequence which is fields as remote sensing, medical imaging, navigation, close or similar to a given pattern, and this form of and related uses. analysis can be broken down into two subtypes: whole sequence matching and subsequence matching. Whole Some of the techniques and data structures which are sequence matching attempts to find all sequences which used when analyzing spatial and related types of data bear a likeness to each other, while subsequence include the use of spatial warehouses, spatial data cubes matching attempts to find those patterns which are and spatial OLAP. Spatial data warehouses can be similar to a specified, given sequence. defined as those which are subject-oriented, integrated, nonvolatile, and time-variant (Han, Kamber, and Tung, Sequential pattern mining has as its focus the 2000). Some of the challenges in constructing a spatial identification of sequences which occur frequently in a data warehouse include the difficulties of integration of time series or sequence of data. This is particularly data from heterogeneous sources, and also applying the useful in the analysis of customers, where certain buying use of on-line analytical processing which is not only patterns could be identified, such as what might be the relatively fast, but also offers some forms of flexibility. likely follow-up purchase to purchasing a certain electronics item or computer, for example. In general, spatial data cubes, which are components of spatial data warehouses, are designed with three types Periodicity analysis attempts to analyze the data from of dimensions and two types of measures. The three the perspective of identifying patterns which repeat or types of dimensions include the nonspatial dimension recur in a time series. This form of data mining analysis (data which is nonspatial in nature), the spatial to can be categorized as being full periodic, partial nonspatial dimension (primitive level is spatial but periodic, or cyclic periodic. In general, full periodic is higher level-generalization is nonspatial), and the the situation where all of the data points in time spatial-to-spatial dimension (both primitive and higher contribute to the behavior of the series. This is in levels are all spatial). In terms of measures, there are contrast to partial periodicity, where only certain points both numerical (numbers only), and spatial (pointers to in time contribute to series behavior. Finally, cyclical spatial object) measured used in spatial data cubes periodicity relates to sets of events which occur (Stefanovic, Han, and Koperski, 2000; Zhou, Truffet, periodically (Han and Kamber, 2001; Han, Pei, et al, and Han, 1999). 2000; Han, Dong, and Yin, 1999; Pei, Han, Pinto, Chen, Dayal, and Hsu, 2001; Pei, Tung, and Han, 2001; Kim, Aside from the implemention of data warehouses for Lam, and Han, 2000). spatial data, there is also the issue of analyses which can be done on the data. Some of the analyses which can be CONSTRAINT- BASED DATA MINING done include association analysis, clustering methods, Many of the data mining techniques which currently and the mining of raster databases There have been a exist are very useful but lack the benefit of any guidance number of studies conducted on spatial data mining or user control. One method of implementing some form (Bedard et al, 2001; Han, Koperski, and Stefanovic, of human involvement into data mining is in the form of 1997; Han, Stefanovic, and Koperski, 1998; Koperski constraint-based data mining. This form of data mining and Han, 1996; Koperski, Han and Marchisio, 1999; incorporates the use of constraints which guides the Koperski, Adikary, and Han, 1996; Koperski, Han, and process. Frequently this is combined with the benefits of Stefanovic, 1998; Tung, Hou, and Han, 2001). multidimensional mining to add greater power to the process (Han, Lakshamanan, and Ng, 1999). TIME SERIES/SEQUENCE DATA MINING There are several categories of constraints which can be Another important area in data mining centers on the used, each of which has its own characteristics and mining of time series and sequence-based data. Simply purpose. These are: put, this involves the mining of a sequence of data, which can either be referenced by time (time-series, Knowledge-type constraints. This type of constraint such as stock market and production process data), or is specifies the “type of knowledge” which is to be mined, simply a sequence of data which is ordered in a and is typically specified at the beginning of any data sequence. In general, one aspect of mining time series mining query. Some of the types of constraints which data focuses on the goal of identifying movements or can be used include clustering, association, and components which exist within the data (trend analysis). classification. These can include long-term or trend movements, seasonal variations, cyclical variations, and random Data constraints. This constraint identifies the data movements (Han and Kamber, 2001). which is to be used in the specific data mining query. Since constraint-based mining is ideally conducted
    • within the framework of an ad-hoc, query driven system, SUMMARY data constraints can be specified in a form similar to that In closing, it would not be overly optimistic to say that of a SQL query. data mining has a bright and promising future, and that the years to come will bring many new developments, Dimension/level constraints. Because much of the methods, and technologies. In addition, the improved information being mined is in the form of a database or integration of techniques and the application of data multidimensional data warehouse, it is possible to mining techniques can bring about the handling of new specify constraints which specify the levels or kinds of data types and applications. As the types of data dimensions to be included in the current query. and information that we have access to increases, so does the number and types of data mining which can be Interestingness constraints. It would also be useful to done. While some analysts and experts in the field determine what ranges of a particular variable or caution that data mining may go the way of artificial measure are considered to be particularly interesting and intelligence (AI), and end up not having the commercial should be included in the query. success which was previously predicted, the field of data mining is still young enough that the possibilities are Rule constraints. It is also important to specify the still limitless. By expanding applications which can use specific rules which should be applied and used for a it, integrating technologies and methods, broadening its particular data mining query or application. applicability to mainstream business applications, and making programs and interfaces easier for end-users to One application of the constraint-based approach is in use, it is quite possible and likely that data mining will the Online Analytical Mining Architecture (OLAM) become one of the key technology areas of the new developed by Han, Lakshamanan, and Ng, 1999, and is millennium. designed to support the multidimensional and constraint- based mining of databases and data warehouses. 4. REFERENCES In short, constraint-based data mining is one of the developing areas which allows for the use of guiding constraints which should make for better data mining. A Bedard, T. Merrett, and J. Han, "Fundamentals of number of studies have been conducted in this area Geospatial Data Warehousing for Geographic (Lakshaman, Ng, Han, and Pang, 1999; Cheung, Knowledge Discovery", H. Miller and J. Han (eds.), Hwang, Fu, and Han, 2000; Pei and Han, 2000; Lu, In Geographic Data Mining and Knowledge Feng, and Han, 2001; Pei, Han, and Lakshaman, 2001; Discovery, Taylor and Francis, 2001. Pei, Han, and Mao, 2000; Tung, Han, Lakshaman, and Ng, 2001; Wang, Zhou, and Han, 2000; Wang, He, and Chakrabarti, S., “Data Mining for Hypertext,” SIGKDD Han, 2000). Explorations, 1 (2), Jan 2000. PHENOMENAL DATA MINING Chakrabarti, van den Berg, and Dom. “Distributed Hypertext Resource Discovery Through Examples,” Phenomenal data mining is not a term for a data mining Proceedings of the 25th VLDB (International project that went extremely well. Instead, it focuses on Conference on Very Large Data Bases ), Edinburgh the relationships between data and the phenomena which Scotland, 1999. are inferred from the data (Mc Carthy, 2000). One example of this is that using receipts from cash Cheung, C. Hwang, A. Fu, and J. Han, "Efficient Rule- supermarket purchases, it is possible to identify various Based Attributed-Oriented Induction for Data aspects of the customers who are making these Mining", Journal of Intelligent Information Systems, purchases. Some of these phenomena could include age, 15(2): 175-200, 2000. income, ethnicity, and purchasing habits. Delmater, R. And M. Hancock, Data Mining Explained, One aspect of phenomenal data mining, and in particular Digital Press, 2001. the goal to infer phenomena from data, is the need to have access to some facts about the relations between Feldman, Ronen, Will Klosgen, and Amir Zilberstein. these data and their related phenomena. These could be “Visualization techniques to explore data mining included the program which examines data for results for document collections.”, In Proceedings of phenomena, or also could be placed in a kind of the Third Annual Conference on Knowledge knowledge base or database which can be drawn upon Discovery and Data Mining (KDD), Newport when doing the data mining. Part of the challenge in Beach, 1997 creating such a knowledge base involves the coding of common sense into a database, which has proved to be a Han, J. and M. Kamber, Data Mining: Concepts and difficult problem so far (Lyons et. al., 1998). Techniques, Morgan Kaufmann, 2001.
    • Han, J., J. Pei, and Y. Yin, "Mining Frequent Patterns Koperski, J. Han, and G. B. Marchisio, "Mining Spatial without Candidate Generation", Proc. 2000 ACM- and Image Data through Progressive Refinement SIGMOD Int. Conf. on Management of Data Methods", European Journal of GIS and Spatial (SIGMOD'00), Dallas, TX, May 2000. Analysis, 9(4):425-440, 1999. Han, J., J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, Koperski, J. Han, and N. Stefanovic, "An Efficient Two- M.-C. Hsu, "FreeSpan: Frequent Pattern-Projected Step Method for Classification of Spatial Data", Sequential Pattern Mining", Proceedings KDD'00, International Symposium on Spatial Data Handling Boston, MA, August 2000. SDH'98, Vancouver, Canada. Han, J. K. Koperski, and N. Stefanovic, "GeoMiner: A Lakshmanan, R. Ng, J. Han, and A. Pang. "Optimization System Prototype for Spatial Data Mining", of constrained frequent set queries with 2-variable Proceedings SIGMOD'97, Tucson, Arizona. constraints", Proceedings SIGMOD'99, Philadelphia, PA, June 1999. Han, J. , V. S. Lakshmanan, and R. T. Ng, "Constraint- Based, Multidimensional Data Mining", Larson, Ray, “Bibliometrics of the world wide web: An COMPUTER (special issue on Data Mining), 32(8): exploratory analysis of the intellectual structure of 46-50, 1999. cyberspace.”In ASIS '96: Proceedings of the 1996 Annual ASIS Meeting, 1996. Han, J., Stefanovic, and K. Koperski, "Selective Materialization: An Efficient Method for Spatial Lu, H., L. Feng, and J. Han, "Beyond Intra-Transaction Data Cube Construction", Proceedings Pacific-Asia Association Analysis:Mining Multi-Dimensional Conference in Knowledge Discovery and Data Inter-Transaction Association Rules", ACM Mining, PAKDD'98, Melbourne, Australia. Transactions on Information Systems, 2001. Han, J., G. Dong and Y. Yin, "Efficient Mining of Lyons, D. And G. Tseytin, “Phenomenal Data Mining Partial Periodic Patterns in Time Series Database", and link analysis”, In Jensen and Goldberg, eds. , Proceedings International Conference on Data Artificial Intelligence and Link Analysis Fall Engineering ICDE'99, Sydney, Australia, March Symposium, 1998. 1999. Maarek, and I.Z. Ben Shaul. Automatically organizing Han, J.,M. Kamber, and A. K. H. Tung, "Spatial bookmarks per content. In Proceedings of the 5th Clustering Methods in Data Mining: A Survey", H. International World Wide Web Conference, 1996. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, McCarthy, J., Phenomenal Data Mining, SIGKDD 2001. Explorations, 1 (2), 2000. Kargupta, H. et al, “Collective Data Mining,” in Miller and J. Han (eds.), Geographic Data Mining and Advances in Distributed Data Mining, Karhgupta Knowledge Discovery, Taylor and Francis, 2001. and Chan, editors, MIT Press, 2000. Mizruchi, M., Mariolis, Schwartz, and Mintz, “Techniques for disaggregating centrality scores in Kargupta, H. and A. Joshi, “Data Mining To Go: social networks,” In Tuma, editor, Sociological Ubiquitous KDD for Mobile and Distributed Methodology, Jossey-Bass, 1986 Environments,” Presentation, KDD-2001, San Francisco, August 2001. Pei and J. Han "Can We Push More Constraints into Frequent Pattern Mining?", Proceedings KDD'00, Kim, J. M.W. Lam, and J. Han, "AIM: Approximate Boston, MA, August 2000. Intelligent Matching for Time Series Data", Proceedings 2000 Int. Conferences on Data Pei, J., J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Warehouse and Knowledge Discovery (DaWaK'00), Hsu, "PrefixSpan: Mining Sequential Patterns Greenwich, U.K., Sept. 2000. Efficiently by Prefix-Projected Pattern Growth", Proceedings 2001 Int. Conference on Data Engineering (ICDE'01), Heidelberg, Germany, April Koperski, J. Adhikary and J. Han, "Spatial Data 2001. Mining: Progress and Challenges", SIGMOD'96 Workshop on Research Issues in Data Mining and Pei, J., J. Han, and L. V. S. Lakshmanan, "Mining Knowledge Discovery DMKD'96, Montreal, Frequent Itemsets with Convertible Constraints", Canada, 1996. Proceedings 2001 Int. Conference on Data
    • Engineering (ICDE'01), Heidelberg, Germany, April Zaiane, O., J. Han, and H. Zhu, "Mining Recurrent Items 2001. in Multimedia with Progressive Resolution Refinement", Proceedings International Conference Pei, J. J. Han, and R. Mao, "CLOSET: An Efficient on Data Engineering Algorithm of Mining Frequent Closed Itemsets for ICDE'00, San Diego, CA, March 2000. Association Rules", Proceedings DMKD'00, Dallas, TX, May 2000. Zhou, D. Truffet, and J. Han, "Efficient Polygon Amalgamation Methods for Spatial OLAP and Pei, J., A. Tung, and J. Han, “Fault-Tolerant Frequent Spatial Data Mining", 6th International Symposium Pattern Mining: Problems and Challenges,” on Spatial Databases, SSD'99, Hong Kong. Proceedings 2001 ACM-SIGMOD, Santa Barbara CA, May 2001. Stefanovic, J. Han, and K. Koperski, "Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes," IEEE Transactions on Knowledge and Data Engineering, 12(6), 2000. Tung, K.H., J. Han, L. V. S. Lakshmanan, and R. T. Ng, "Constraint-Based Clustering in Large Databases", Proceedings 2001 Intl. Conf. on Database Theory (ICDT'01), London, U.K., Jan. 2001. Tung, K. H. J. Hou, and J. Han, "Spatial Clustering in the Presence of Obstacles", Proc. 2001 Intl. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001. Wang, S. Zhou, and J. Han, "Pushing support constraints into association mining", International Conference on Very Large Data Bases VLDB'00, Cairo, Egypt, Sept. 2000. Wang, K., Y. He and J. Han, "Mining Frequent Itemsets Using Support Constraints", Proceedings 2000 Int. Conference on Very Large Data Bases (VLDB'00), Cairo, Egypt, Sept. 2000, pp. 43-52. Wise, James A. James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, and Anne Schur. “Visualizing the non-visual: Spatial analysis and interaction with information from text documents.” In Proceedings of the Information Visualization Symposium 95, pages 51-58. IEEE Computer Society Press, 1995 Xu, J. and W. B. Croft. “Query expansion using local and global document analysis.” In SIGIR '96: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 4- 11, Zurich. 1996. Zaiane, Han, Li, and Hou, “Mining Multimedia Data,” Proceedings of Meeting Of Minds CASCON ’98, Toronto Canada, November 1998.