DATA MINING TRENDS AND DEVELOPMENTS : The Key Data Mining ...Document Transcript
DATA MINING TRENDS AND DEVELOPMENTS :
The Key Data Mining Technologies and Applications for
the 21st Century
Information Systems, Fairleigh Dickinson University
Madison, NJ 07940, USA
This paper discusses a number of technologies, approaches, and research areas which have been identified as having
critical and future promise in the field of data mining. There is currently an explosion in the amount of data which we
now produce and have access to, and mining from these sources can uncover important information. The extensive use
of handheld, wireless, and other ubiquitous devices is a developing area, since a lot of information being created and
transmitted would be maintained and stored only on these kinds of devices. Among the other areas which are being
developed, investigated, and applications identified for include hypertext and hypermedia data mining, phenomenal
data mining, distributed/collective data mining, constraint-based data mining, and other related methods.
Keywords: Data mining; distributed, ubiquitous, hypertext, multimedia, time-series, constraint-based, phenomenal
1. THE PRESENT AND THE FUTURE
As mentioned previously, the field of data mining is
The field of data mining and knowledge discovery in very broad, and there are many methods and
databases (KDD) has been growing in leaps and bounds, technologies which have become dominant in the field.
and has shown great potential for the future (Han and Not only have there been developments in the
Kamber, 2001). The purpose of this paper is to survey “traditional” areas of data mining, there are other areas
many of the future trends in the field of data mining, which have been identified as being especially important
with a focus on those which are thought to have the as future trends in the field. These are the subject of the
most promise and applicability to future data mining upcoming sections and form the major emphasis of this
What is the future of data mining? Certainly, the field 2. MAJOR TRENDS IN TECHNOLOGIES AND
has made great strides in past years, and many industry METHODS
analysts and experts in the area feel that the future will
be bright. There is definite growth in the area of data There are a number of data mining trends is in terms of
mining. Many industry analysts and research firms have technologies and methodologies which are currently
projected a bright future for the entire data mining/KDD being developed and researched. These trends include
area, and its related area of CRM (customer relationship methods for analyzing more complex forms of data, as
management). According to IDC, spending in the area of well as specific techniques and methods. The trends
business intelligence, which encompasses data mining, identified include distributed data mining,
is estimated to increase from $3.6 billion in 2000 to hypertext/hypermedia mining, ubiquitous data mining,
$11.9 billion in 2005. The growth in the CRM Analytic as well as multimedia, spatial, and time series/sequential
application market is expected to approach 54.1% per data mining. These are examined in detail in the
year through 2003. In addition, data mining projects are upcoming sections.
expected to grow by more than 300% by the year 2002.
By 2003, over 90% of consumer-based industries with
e-commerce orientation will utilize some kind of data
3. DATA MINING TRENDS IN objectives of UDM is to mine data while minimizing the
TECHNOLOGIES AND METHODS cost of ubiquitous presence.
DISTRIBUTED/COLLECTIVE DATA MINING Human-computer interaction is another challenging
One area of data mining which is attracting a good aspect of UDM. Visualizing patterns like classifiers,
amount of attention is that of distributed and collective clusters, associations and others, in portable devices are
data mining. Much of the data mining which is being usually difficult. The small display areas offer serious
done currently focuses on a database or data warehouse challenges to interactive data mining environments. Data
of information which is physically located in one place. management in a mobile environment is also a
However, the situation arises where information may be challenging issue. Moreover, the sociological and
located in different places, in different physical psychological aspects of the integration between data
locations. This is known generally as distributed data mining technology and our lifestyle are yet to be
mining (DDM). Therefore, the goal is to effectively explored. The key issues to consider include theories of
mine distributed data which is located in heterogeneous UDM, advanced algorithms for mobile and distributed
sites. Examples of this include biological information applications, data management issues, mark-up
located in different databases, data which comes from languages, and other data representation techniques;
the databases of two different firms, or analysis of data integration with database applications for mobile
from different branches of a corporation, the combining environments, architectural issues: (architecture, control,
of which would be an expensive and time-consuming security, and communication issues), specialized mobile
process. devices for UDM, software agents and UDM (Agent
based approaches in UDM, agent interaction---
Distributed data mining (DDM) is used to offer a cooperation, collaboration, negotiation, organizational
different approach to traditional approaches analysis, by behavior), applications of UDM (Application in
using a combination of localized data analysis, together business, science, engineering, medicine, and other
with a “global data model. ” In more specific terms, this disciplines), location management issues in UDM and
is specified as: technology for web-based applications of UDM
-performing local data analysis for generating partial (Kargupta and Joshi, 2001).
and HYPERTEXT AND HYPERMEDIA DATA
-combining the local data models from different data MINING
sites in order to develop the global model. Hypertext and hypermedia data mining can be
characterized as mining data which includes text,
This global model combines the results of the separate hyperlinks, text markups, and various other forms of
analyses. Often the global model produced, especially if hypermedia information. As such, it is closely related to
the data in different locations has different features or both web mining, and multimedia mining, which are
characteristics, may become incorrect or ambiguous. covered separately in this section, but in reality are quite
This problem is especially critical when the data in close in terms of content and applications. While the
distributed sites is heterogeneous rather than World Wide Web is substantially composed of hypertext
homogeneous. These heterogeneous data sets are known and hypermedia elements, there are other kinds of
as vertically partitioned datasets. hypertext/hypermedia data sources which are not found
on the web. Examples of these include the information
An approach proposed by Kargupta et al (2000) speaks found in online catalogues, digital libraries, online
of the collective data mining (CDM) approach, which information databases, and the like. In addition to the
provides a better approach to vertically partitioned traditional forms of hypertext and hypermedia, together
datasets, using the notion of orthonormal basis with the associated hyperlink structures, there are also
functions, and computes the basis coefficients to inter-document structures which exist on the web, such
generate the global model of the data (Kargupta et. al., as the directories employed by such services as Yahoo!
2000). (www.yahoo.com) or the Open Directory project
(http://dmoz.org) These taxonomies of topics and
UBIQUITOUS DATA MINING (UDM) subtopics are linked together to form a large network or
The advent of laptops, palmtops, cell phones, and hierarchical tree of topics and associated links and
wearable computers is making ubiquitous access to large pages.
quantity of data possible. Advanced analysis of data for
extracting useful knowledge is the next natural step in Some of the important data mining techniques used for
the world of ubiquitous computing. hypertext and hypermedia data mining include
classification (supervised learning), clustering
Accessing and analyzing data from a ubiquitous (unsupervised learning), semi-structured learning, and
computing device offer many challenges. For example, social network analysis.
UDM introduces additional cost due to communication,
computation, security, and other factors. So one of the
In the case of classification, or supervised learning, the Multimedia information, because its nature as a large
process starts off by reviewing training data in which collection of multimedia objects, must be represented
items are marked as being part of a certain class or differently from conventional forms of data. One
group. This data is the basis from which the algorithm is approach is to create a multimedia data cube which can
trained. One application of classification is in the area be used to convert multimedia-type data into a form
of web topic directories, which can group similar- which is suited to analysis using one of the main data
sounding or spelled terms into appropriate categories, so mining techniques, but taking into account the unique
that searches will not bring up inappropriate sites and characteristics of the data. This may include the use of
pages. The use of classification can also result in measures and dimensions for texture, shape, color, and
searches which are not only based on keywords, but also related attributes. In essence, it is possible to create a
on category and classification attributes. Methods used multidimensional spatial database. Among the types of
for classification include naive Bayes classification, analyses which can be conducted on multimedia
parameter smoothing, dependence modeling, and databases include associations, clustering, classification,
maximum entropy (Chakrabarti, 2000). and similarity search.
Unsupervised learning, or clustering, differs from Another developing area in multimedia data mining is
classification in that classification involved the use of that of audio data mining (mining music). The idea is
training data, clustering is concerned with the creation of basically to use audio signals to indicate the patterns of
hierarchies of documents based on similarity, and data or to represent the features of data mining results.
organize the documents based on that hierarchy. The basic advantage of audio data mining is that while
Intuitively, this would result in more similar documents using a technique such as visual data mining may
being placed on the leaf levels of the hierarchy, with less disclose interesting patterns from observing graphical
similar sets of document areas being placed higher up, displays, it does require users to concentrate on
closer to the root of the tree. Techniques which have watching patterns, which can become monotonous. But
been used for unsupervised learning include k-means when representing it as a stream of audio, it is possible
clustering, agglomerative clustering, random to transform patterns into sound and music and listen to
projections, and latent semantic indexing. pitches, rhythms, tune, and melody in order to identify
anything interesting or unusual. Its is possible not only
Semi-supervised learning and social network analysis to summarize melodies, based on the approximate
are other methods which are important to hypermedia- patterns that repeatedly occur in the segment, but also to
based data mining. Semi-supervised learning is the case summarize style, based on tone, tempo, or the major
where there are both labeled and unlabeled documents, musical instruments played (Zaiane, Han, and Zhu,
and there is a need to learn from both types of 2000; Han and Kamber, 2001).
documents. Social network analysis is also applicable
because the web is considered a social network, which SPATIAL AND GEOGRAPHIC DATA MINING
examines networks formed through collaborative The data types which come to mind when the term data
association, whether it be between friends, academics mining is mentioned involves data as we know it—
doing research or service on committees, and between statistical, generally numerical data of varying kinds.
papers through references and citations. Graph distances However, it is also important to consider information
and various aspects of connectivity come into play when which is of an entirely different kind—spatial and
working in the area of social networks (Larson, 1996; geographic data which could contain information about
Mizruchi et. al., 1986). Other research conducted in the astronomical data, natural resources, or even orbiting
area of hypertext data mining include work on satellites and spacecraft which transmit images of earth
distributed hypertext resource discovery (Chakrabarti, from out in space. Much of this data is image-oriented,
van Berg and Dom, 1999). and can represent a great deal of information if properly
analyzed and mined (Miller and Han, 2001) .
MULTIMEDIA DATA MINING
Multimedia Data Mining is the mining and analysis of A definition of spatial data mining is as follows: “the
various types of data, including images, video, audio, extraction of implicit knowledge, spatial relationships,
and animation. The idea of mining data which contains or other patterns not explicitly stored in spatial
different kinds of information is the main objective of databases.” Some of the components of spatial data
multimedia data mining (Zaiane et. al., 1998). As which differentiate it from other kinds include distance
multimedia data mining incorporates the areas of text and topological information, which can be indexed using
mining, as well as hypertext/hypermedia mining, these multidimensional structures, and required special spatial
fields are closely related. Much of the information data access methods, together with spatial knowledge
describing these other areas also apply to multimedia representation and data access methods, along with the
data mining. This field is also rather new, but holds ability to handle geometric calculations.
much promise for the future.
Analyzing spatial and geographic data include such
tasks as understanding and browsing spatial data,
uncovering relationships between spatial data items (and Other techniques which can be used on these kinds of
also between non-spatial and spatial items), and also data include similarity search, sequential pattern mining,
analysis using spatial databases and spatial knowledge and periodicity analysis. Similarity search is concerned
bases. The applications of these would be useful in such with the identification of a pattern sequence which is
fields as remote sensing, medical imaging, navigation, close or similar to a given pattern, and this form of
and related uses. analysis can be broken down into two subtypes: whole
sequence matching and subsequence matching. Whole
Some of the techniques and data structures which are sequence matching attempts to find all sequences which
used when analyzing spatial and related types of data bear a likeness to each other, while subsequence
include the use of spatial warehouses, spatial data cubes matching attempts to find those patterns which are
and spatial OLAP. Spatial data warehouses can be similar to a specified, given sequence.
defined as those which are subject-oriented, integrated,
nonvolatile, and time-variant (Han, Kamber, and Tung, Sequential pattern mining has as its focus the
2000). Some of the challenges in constructing a spatial identification of sequences which occur frequently in a
data warehouse include the difficulties of integration of time series or sequence of data. This is particularly
data from heterogeneous sources, and also applying the useful in the analysis of customers, where certain buying
use of on-line analytical processing which is not only patterns could be identified, such as what might be the
relatively fast, but also offers some forms of flexibility. likely follow-up purchase to purchasing a certain
electronics item or computer, for example.
In general, spatial data cubes, which are components of
spatial data warehouses, are designed with three types Periodicity analysis attempts to analyze the data from
of dimensions and two types of measures. The three the perspective of identifying patterns which repeat or
types of dimensions include the nonspatial dimension recur in a time series. This form of data mining analysis
(data which is nonspatial in nature), the spatial to can be categorized as being full periodic, partial
nonspatial dimension (primitive level is spatial but periodic, or cyclic periodic. In general, full periodic is
higher level-generalization is nonspatial), and the the situation where all of the data points in time
spatial-to-spatial dimension (both primitive and higher contribute to the behavior of the series. This is in
levels are all spatial). In terms of measures, there are contrast to partial periodicity, where only certain points
both numerical (numbers only), and spatial (pointers to in time contribute to series behavior. Finally, cyclical
spatial object) measured used in spatial data cubes periodicity relates to sets of events which occur
(Stefanovic, Han, and Koperski, 2000; Zhou, Truffet, periodically (Han and Kamber, 2001; Han, Pei, et al,
and Han, 1999). 2000; Han, Dong, and Yin, 1999; Pei, Han, Pinto, Chen,
Dayal, and Hsu, 2001; Pei, Tung, and Han, 2001; Kim,
Aside from the implemention of data warehouses for Lam, and Han, 2000).
spatial data, there is also the issue of analyses which can
be done on the data. Some of the analyses which can be CONSTRAINT- BASED DATA MINING
done include association analysis, clustering methods, Many of the data mining techniques which currently
and the mining of raster databases There have been a exist are very useful but lack the benefit of any guidance
number of studies conducted on spatial data mining or user control. One method of implementing some form
(Bedard et al, 2001; Han, Koperski, and Stefanovic, of human involvement into data mining is in the form of
1997; Han, Stefanovic, and Koperski, 1998; Koperski constraint-based data mining. This form of data mining
and Han, 1996; Koperski, Han and Marchisio, 1999; incorporates the use of constraints which guides the
Koperski, Adikary, and Han, 1996; Koperski, Han, and process. Frequently this is combined with the benefits of
Stefanovic, 1998; Tung, Hou, and Han, 2001). multidimensional mining to add greater power to the
process (Han, Lakshamanan, and Ng, 1999).
TIME SERIES/SEQUENCE DATA MINING There are several categories of constraints which can be
Another important area in data mining centers on the used, each of which has its own characteristics and
mining of time series and sequence-based data. Simply purpose. These are:
put, this involves the mining of a sequence of data,
which can either be referenced by time (time-series, Knowledge-type constraints. This type of constraint
such as stock market and production process data), or is specifies the “type of knowledge” which is to be mined,
simply a sequence of data which is ordered in a and is typically specified at the beginning of any data
sequence. In general, one aspect of mining time series mining query. Some of the types of constraints which
data focuses on the goal of identifying movements or can be used include clustering, association, and
components which exist within the data (trend analysis). classification.
These can include long-term or trend movements,
seasonal variations, cyclical variations, and random Data constraints. This constraint identifies the data
movements (Han and Kamber, 2001). which is to be used in the specific data mining query.
Since constraint-based mining is ideally conducted
within the framework of an ad-hoc, query driven system, SUMMARY
data constraints can be specified in a form similar to that In closing, it would not be overly optimistic to say that
of a SQL query. data mining has a bright and promising future, and that
the years to come will bring many new developments,
Dimension/level constraints. Because much of the methods, and technologies. In addition, the improved
information being mined is in the form of a database or integration of techniques and the application of data
multidimensional data warehouse, it is possible to mining techniques can bring about the handling of new
specify constraints which specify the levels or kinds of data types and applications. As the types of data
dimensions to be included in the current query. and information that we have access to increases, so
does the number and types of data mining which can be
Interestingness constraints. It would also be useful to done. While some analysts and experts in the field
determine what ranges of a particular variable or caution that data mining may go the way of artificial
measure are considered to be particularly interesting and intelligence (AI), and end up not having the commercial
should be included in the query. success which was previously predicted, the field of data
mining is still young enough that the possibilities are
Rule constraints. It is also important to specify the still limitless. By expanding applications which can use
specific rules which should be applied and used for a it, integrating technologies and methods, broadening its
particular data mining query or application. applicability to mainstream business applications, and
making programs and interfaces easier for end-users to
One application of the constraint-based approach is in use, it is quite possible and likely that data mining will
the Online Analytical Mining Architecture (OLAM) become one of the key technology areas of the new
developed by Han, Lakshamanan, and Ng, 1999, and is millennium.
designed to support the multidimensional and constraint-
based mining of databases and data warehouses.
In short, constraint-based data mining is one of the
developing areas which allows for the use of guiding
constraints which should make for better data mining. A Bedard, T. Merrett, and J. Han, "Fundamentals of
number of studies have been conducted in this area Geospatial Data Warehousing for Geographic
(Lakshaman, Ng, Han, and Pang, 1999; Cheung, Knowledge Discovery", H. Miller and J. Han (eds.),
Hwang, Fu, and Han, 2000; Pei and Han, 2000; Lu, In Geographic Data Mining and Knowledge
Feng, and Han, 2001; Pei, Han, and Lakshaman, 2001; Discovery, Taylor and Francis, 2001.
Pei, Han, and Mao, 2000; Tung, Han, Lakshaman, and
Ng, 2001; Wang, Zhou, and Han, 2000; Wang, He, and Chakrabarti, S., “Data Mining for Hypertext,” SIGKDD
Han, 2000). Explorations, 1 (2), Jan 2000.
PHENOMENAL DATA MINING Chakrabarti, van den Berg, and Dom. “Distributed
Hypertext Resource Discovery Through Examples,”
Phenomenal data mining is not a term for a data mining Proceedings of the 25th VLDB (International
project that went extremely well. Instead, it focuses on Conference on Very Large Data Bases ), Edinburgh
the relationships between data and the phenomena which Scotland, 1999.
are inferred from the data (Mc Carthy, 2000). One
example of this is that using receipts from cash Cheung, C. Hwang, A. Fu, and J. Han, "Efficient Rule-
supermarket purchases, it is possible to identify various Based Attributed-Oriented Induction for Data
aspects of the customers who are making these Mining", Journal of Intelligent Information Systems,
purchases. Some of these phenomena could include age, 15(2): 175-200, 2000.
income, ethnicity, and purchasing habits.
Delmater, R. And M. Hancock, Data Mining Explained,
One aspect of phenomenal data mining, and in particular Digital Press, 2001.
the goal to infer phenomena from data, is the need to
have access to some facts about the relations between Feldman, Ronen, Will Klosgen, and Amir Zilberstein.
these data and their related phenomena. These could be “Visualization techniques to explore data mining
included the program which examines data for results for document collections.”, In Proceedings of
phenomena, or also could be placed in a kind of the Third Annual Conference on Knowledge
knowledge base or database which can be drawn upon Discovery and Data Mining (KDD), Newport
when doing the data mining. Part of the challenge in Beach, 1997
creating such a knowledge base involves the coding of
common sense into a database, which has proved to be a Han, J. and M. Kamber, Data Mining: Concepts and
difficult problem so far (Lyons et. al., 1998). Techniques, Morgan Kaufmann, 2001.
Han, J., J. Pei, and Y. Yin, "Mining Frequent Patterns Koperski, J. Han, and G. B. Marchisio, "Mining Spatial
without Candidate Generation", Proc. 2000 ACM- and Image Data through Progressive Refinement
SIGMOD Int. Conf. on Management of Data Methods", European Journal of GIS and Spatial
(SIGMOD'00), Dallas, TX, May 2000. Analysis, 9(4):425-440, 1999.
Han, J., J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, Koperski, J. Han, and N. Stefanovic, "An Efficient Two-
M.-C. Hsu, "FreeSpan: Frequent Pattern-Projected Step Method for Classification of Spatial Data",
Sequential Pattern Mining", Proceedings KDD'00, International Symposium on Spatial Data Handling
Boston, MA, August 2000. SDH'98, Vancouver, Canada.
Han, J. K. Koperski, and N. Stefanovic, "GeoMiner: A Lakshmanan, R. Ng, J. Han, and A. Pang. "Optimization
System Prototype for Spatial Data Mining", of constrained frequent set queries with 2-variable
Proceedings SIGMOD'97, Tucson, Arizona. constraints", Proceedings SIGMOD'99,
Philadelphia, PA, June 1999.
Han, J. , V. S. Lakshmanan, and R. T. Ng, "Constraint-
Based, Multidimensional Data Mining", Larson, Ray, “Bibliometrics of the world wide web: An
COMPUTER (special issue on Data Mining), 32(8): exploratory analysis of the intellectual structure of
46-50, 1999. cyberspace.”In ASIS '96: Proceedings of the 1996
Annual ASIS Meeting, 1996.
Han, J., Stefanovic, and K. Koperski, "Selective
Materialization: An Efficient Method for Spatial Lu, H., L. Feng, and J. Han, "Beyond Intra-Transaction
Data Cube Construction", Proceedings Pacific-Asia Association Analysis:Mining Multi-Dimensional
Conference in Knowledge Discovery and Data Inter-Transaction Association Rules", ACM
Mining, PAKDD'98, Melbourne, Australia. Transactions on Information Systems, 2001.
Han, J., G. Dong and Y. Yin, "Efficient Mining of Lyons, D. And G. Tseytin, “Phenomenal Data Mining
Partial Periodic Patterns in Time Series Database", and link analysis”, In Jensen and Goldberg, eds. ,
Proceedings International Conference on Data Artificial Intelligence and Link Analysis Fall
Engineering ICDE'99, Sydney, Australia, March Symposium, 1998.
Maarek, and I.Z. Ben Shaul. Automatically organizing
Han, J.,M. Kamber, and A. K. H. Tung, "Spatial bookmarks per content. In Proceedings of the 5th
Clustering Methods in Data Mining: A Survey", H. International World Wide Web Conference, 1996.
Miller and J. Han (eds.), Geographic Data Mining
and Knowledge Discovery, Taylor and Francis, McCarthy, J., Phenomenal Data Mining, SIGKDD
2001. Explorations, 1 (2), 2000.
Kargupta, H. et al, “Collective Data Mining,” in Miller and J. Han (eds.), Geographic Data Mining and
Advances in Distributed Data Mining, Karhgupta Knowledge Discovery, Taylor and Francis, 2001.
and Chan, editors, MIT Press, 2000.
Mizruchi, M., Mariolis, Schwartz, and Mintz,
“Techniques for disaggregating centrality scores in
Kargupta, H. and A. Joshi, “Data Mining To Go:
social networks,” In Tuma, editor, Sociological
Ubiquitous KDD for Mobile and Distributed
Methodology, Jossey-Bass, 1986
Environments,” Presentation, KDD-2001, San
Francisco, August 2001.
Pei and J. Han "Can We Push More Constraints into
Frequent Pattern Mining?", Proceedings KDD'00,
Kim, J. M.W. Lam, and J. Han, "AIM: Approximate Boston, MA, August 2000.
Intelligent Matching for Time Series Data",
Proceedings 2000 Int. Conferences on Data Pei, J., J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C.
Warehouse and Knowledge Discovery (DaWaK'00), Hsu, "PrefixSpan: Mining Sequential Patterns
Greenwich, U.K., Sept. 2000. Efficiently by Prefix-Projected Pattern Growth",
Proceedings 2001 Int. Conference on Data
Engineering (ICDE'01), Heidelberg, Germany, April
Koperski, J. Adhikary and J. Han, "Spatial Data 2001.
Mining: Progress and Challenges", SIGMOD'96
Workshop on Research Issues in Data Mining and Pei, J., J. Han, and L. V. S. Lakshmanan, "Mining
Knowledge Discovery DMKD'96, Montreal, Frequent Itemsets with Convertible Constraints",
Canada, 1996. Proceedings 2001 Int. Conference on Data
Engineering (ICDE'01), Heidelberg, Germany, April Zaiane, O., J. Han, and H. Zhu, "Mining Recurrent Items
2001. in Multimedia with Progressive Resolution
Refinement", Proceedings International Conference
Pei, J. J. Han, and R. Mao, "CLOSET: An Efficient on Data Engineering
Algorithm of Mining Frequent Closed Itemsets for ICDE'00, San Diego, CA, March 2000.
Association Rules", Proceedings DMKD'00, Dallas,
TX, May 2000. Zhou, D. Truffet, and J. Han, "Efficient Polygon
Amalgamation Methods for Spatial OLAP and
Pei, J., A. Tung, and J. Han, “Fault-Tolerant Frequent Spatial Data Mining", 6th International Symposium
Pattern Mining: Problems and Challenges,” on Spatial Databases, SSD'99, Hong Kong.
Proceedings 2001 ACM-SIGMOD, Santa Barbara
CA, May 2001.
Stefanovic, J. Han, and K. Koperski, "Object-Based
Selective Materialization for Efficient
Implementation of Spatial Data Cubes," IEEE
Transactions on Knowledge and Data Engineering,
Tung, K.H., J. Han, L. V. S. Lakshmanan, and R. T. Ng,
"Constraint-Based Clustering in Large Databases",
Proceedings 2001 Intl. Conf. on Database Theory
(ICDT'01), London, U.K., Jan. 2001.
Tung, K. H. J. Hou, and J. Han, "Spatial Clustering in
the Presence of Obstacles", Proc. 2001 Intl. Conf. on
Data Engineering (ICDE'01), Heidelberg, Germany,
Wang, S. Zhou, and J. Han, "Pushing support constraints
into association mining", International Conference
on Very Large Data Bases VLDB'00, Cairo, Egypt,
Wang, K., Y. He and J. Han, "Mining Frequent Itemsets
Using Support Constraints", Proceedings 2000 Int.
Conference on Very Large Data Bases (VLDB'00),
Cairo, Egypt, Sept. 2000, pp. 43-52.
Wise, James A. James J. Thomas, Kelly Pennock, David
Lantrip, Marc Pottier, and Anne Schur. “Visualizing
the non-visual: Spatial analysis and interaction with
information from text documents.” In Proceedings
of the Information Visualization Symposium 95,
pages 51-58. IEEE Computer Society Press, 1995
Xu, J. and W. B. Croft. “Query expansion using local
and global document analysis.”
In SIGIR '96: Proceedings of the 19th Annual
International ACM SIGIR Conference on Research
and Development in Information Retrieval, pages 4-
11, Zurich. 1996.
Zaiane, Han, Li, and Hou, “Mining Multimedia Data,”
Proceedings of Meeting Of Minds CASCON ’98,
Toronto Canada, November 1998.