08. Mining Type Of Complex Data


Published on

Course materials from Mr. Yudho Giri Sucahyo (MTI UI). Uploaded by Achmad Solichin (<a>http://hotnewsarchive.info</a>)

Published in: Education, Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

08. Mining Type Of Complex Data

  1. 1. Objectives Mining spatial databases g p Mining multimedia databases Mining time-series and sequence data Mining stream data Mining Complex Types of Data g p yp Mining text databases g Lecture 6/DMBI/IKI83403T/MTI/UI Mining the World-Wide Web Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Summary 2 University of Indonesia Introduction Mining Spatial Databases Spatial database Previously, we focused on mining relational databases, Space-related data transaction databases, and data warehouses formed by Maps, remote sensing, medical imaging data,VLSI chip layout the transformation and integration of structured data. Spatial data i i S ti l d t mining Complex types of data: Extraction of knowledge, spatial relationships, or other interesting Spatial data p patterns not explicitly stored in spatial databases. p y p Multimedia data Wide applications: Time-series data GIS, geomarketing, remote sensing, image database exploration Text data Medical imaging, navigation, traffic control, environmental studies WWW Spatial data warehouse Subject-oriented, i t S bj t i t d integrated, time-variant, and nonvolatile spatial d t t d ti i t d l til ti l data repository Spatial DW that supports spatial OLAP can display weather p p pp p p y patterns on a map by any dimensions, including drill-down and roll-up capability. 3 University of Indonesia 4 University of Indonesia
  2. 2. Mining Spatial Databases (2) Dimensions and Measures in Spatial DW Spatial data integration: a big issue Structure-specific formats (raster- vs. vector-based, OO vs. relational Dimensions Measures models, different storage and indexing, etc.) non-spatial numerical (e.g. monthly revenue of a Vendor-specific formats (ESRI, MapInfo, Integraph, IDRISI, etc.) e.g. “25-30 degrees” region) generalizes to“hot” (both are distributive (e.g. count, sum) Spatial data cube: multidimensional spatial database p p strings) g) spatial-to-nonspatial algebraic (e.g. average) Both dimensions and measures may contain spatial components e.g. Seattle generalizes to holistic (e.g. median, rank) description “P ifi N th t” d i ti “Pacific Northwest” spatial (as a string) collection of spatial pointers (e.g. spatial-to-spatial p p pointers to all regions with i ll i ih e.g. Seattle generalizes to Pacific Northwest (as a spatial temperature of 25-30 degrees in region) July) 5 University of Indonesia 6 University of Indonesia Example: British Columbia Weather Spatial-to-Spatial Spatial to Spatial Generalization Pattern Analysis Input A map with about 3,000 weather probes scattered in B.C. Generalize detailed Dissolve Daily data for temperature, precipitation, wind velocity, etc. geographic points into Data warehouse using star schema clustered regions, such as Merge Output businesses, residential, A map that reveals patterns: merged (similar) regions p p g ( ) g industrial, or agricultural Goals areas, according to land usage g g Clip Interactive analysis (drill-down, slice, dice, pivot, roll-up) Fast response time Requires the merging of a set Minimizing storage space used of geographic areas by spatial Intersect Challenge Challen e operations A merged region may contain hundreds of “primitive” regions (polygons) Union 7 University of Indonesia 8 University of Indonesia
  3. 3. Star Schema of the BC Weather Warehouse Dynamic Merging of Spatial Objects Spatial data warehouse Dimensions region_name time temperature t t precipitation Measurements region_map a ea area count 9 Dimension table Fact table University of Indonesia 10 University of Indonesia Spatial Association Analysis Spatial Classification Spatial association rule: A ⇒ B [s%,, c%] p [ ] Analyze spatial objects to derive classification schemes, such as decision trees, in relevance to A and B are sets of spatial or non-spatial predicates certain spatial properties (district, highway, river, etc.) Topological l ti T l i l relations: iintersects, overlaps, disjoint, etc. t t l p di j i t t Classifying medium-size families according to income, region, Spatial orientations: left_of, west_of, under, etc. and infant mortality rates Distance information: close_to, within_distance, etc. Mining for volcanoes on Venus s% is the support and c% is the confidence of the rule Employ most of the methods in Chapter 7 Examples Decision-tree classification, Naïve-Bayesian classifier + boosting, neural network, genetic programming, etc. 1) is a(x large town) ^ intersect(x, highway) → adjacent to(x water) is_a(x, large_town) intersect(x adjacent_to(x, Association-based multi-dimensional classification - Example: [7%, 85%] classifying house value based on proximity to lakes, highways, 2) Wh ki d of objects are typically l What kinds f bj i ll located close to golf courses? d l lf ? mountains, etc. t i t 11 University of Indonesia 12 University of Indonesia
  4. 4. Spatial Trend Analysis Spatial Cluster Analysis Mining clusters—k-means, k-medoids, Function hierarchical, density-based, etc. Analysis f distinct features of the clusters A l i of di ti t f t f th l t Detect changes and trends along a spatial dimension Study the trend of non-spatial or spatial data changing with non spatial space Application examples Observe the trend of changes of the climate or vegetation with increasing distance f i i di from an ocean Crime rate or unemployment rate change with regard to city geo-distribution 13 University of Indonesia 14 University of Indonesia Constraint-Based Clustering: Planning ATM Locations Mining Multimedia Databases Multimedia DB system stores and manages a large collection of multimedia objects, such as audio data, C3 image data, video data, and hypertext data, which C2 contain text, text markups, and linkages. C1 Example: p NASA EOS (Earth Observation System) various kinds of River image and audio0-video databases, human genome databases, and Internet databases. Mountain C4 Clustering without taking Spatial data with obstacles obstacles i t consideration b t l into id ti 15 University of Indonesia 16 University of Indonesia
  5. 5. Queries in Similarity Search in Multimedia Data Content-Based Content Based Retrieval Systems Description-based retrieval systems Image sample-based queries Build indices and perform object retrieval based on image Find all of the images that are similar to the given image sample descriptions, such as keywords, captions, size, and time of p , y , p , , Compare the feature vector (signature) extracted from the creation sample with the feature vectors of images that have already Labor intensive Labor-intensive if performed manually been extracted and indexed in the image database Results are typically of poor quality if automated Image feature specification queries Specify or sketch image features like color texture, or shape color, texture shape, Content-based retrieval systems which are translated into a feature vector Support retrieval based on the image content, such as color pp g Match the feature vector with the feature vectors of the histogram, texture, shape, objects, and wavelet transforms images in the database Applications: medical diagnosis, weather prediction, TV production, Web search engines for images, and e-commerce. 17 University of Indonesia 18 University of Indonesia C-BIRD: Content-Based Image Retrieval Approaches Based on Image Signature from Digital libraries Color histogram-based signature Search The signature includes color histograms based on color by image colors composition of an image regardless of its scale or by color percentage orientation by color layout by texture density No information about shape, location, or texture by texture Layout Two images with similar color composition may contain very by bj t b object modeld l different shapes or textures, and thus could be completely by illumination invariance unrelated in semantics by keywords Multifeature composed signature Define different distance functions for color, shape, location, , p , , and texture, and subsequently combine them to derive the overall result. 19 University of Indonesia 20 University of Indonesia
  6. 6. C-BIRD: Content-Based Image Retrieval from Digital libraries (2) Model-Based Model Based Search Search by Model Search by colors Search by color layout y Search by texture l t t layout t 21 University of Indonesia 22 University of Indonesia Multi-Dimensional Analysis in Multi-Dimensional Analysis in Multimedia Databases Multimedia Databases Color layout Color hi t C l histogram Texture l T t layout t 23 University of Indonesia 24 University of Indonesia
  7. 7. Mining Multimedia Databases Mining Multimedia Databases The Data Cube and the Sub-Space Measurements p Refining or combining searches By Size By Format Search for “airplane in blue sky” By B Format & Si e Size RED (top layout grid is blue and WHITE BLUE keyword = “airplane”) Cross Tab By Colour & Size JPEG GIF By Colour By Format & Colour RED WHITE Sum By Colour BLUE • Format of image By Format • Duration Group By Sum Colour • Colors Search for “blue sky and RED • Textures g green meadows” WHITE BLUE • Keywords Search for “blue sky” • Si Size (top layout grid is blue • Width Measurement (top layout grid is blue) and bottom is green) Sum • Height • Internet domain of image • Internet domain of parent pages 25 26 • Image popularity University of Indonesia 26 University of Indonesia Mining Multimedia Databases Mining Multimedia Databases 27 University of Indonesia 28 University of Indonesia
  8. 8. Automatic Extraction of Image Content Features Allows Search Classification in MultiMediaMiner by image content like colors, textures, etc. textures etc Window Colors Color Layout y g Color Histogram Texture and locales Thumbnails Content-Based Content Based Search Multimedia Keywords and Multimedia Database Data i i D t mining Descriptions 29 University of Indonesia 30 University of Indonesia Mining Associations in Multimedia Data From Coarse to Fine Resolution Mining Special features: Different Resolution Hierarchy e e eso u o e c y Need # of occurrences besides Boolean existence, e.g., “Two red square and one blue circle implies theme “air-show” Two circle” air show Need spatial relationships Blue on top of white squared object is associated with brown p q j bottom Need multi-resolution and progressive refinement mining It is expensive to explore detailed associations among objects at high resolution t hi h l ti It is crucial to ensure the completeness of search at multi- resolution space 31 University of Indonesia 32 University of Indonesia
  9. 9. Mining Time Series and Sequence Data Time-Series Mining Time Series and Sequence Data Time-Series Time-series database Temporal Data Consists of sequences of values or events changing with time Time-Series Data Trans. time CustId Video Data is recorded at regular intervals June 10, 93 10 2 A, A B Date Stock Price$ Characteristic time-series components June 12, 93 5 H June 11, 93 IBM 98.5 Trend, cycle, seasonal, irregular y g June 15, 93 2 C June 11, 93 MSFT 78.0 June 20, 93 2 D, F, G June 11, 93 INTC 76.5 Applications June 25, 93 1 C June 12, 93 IBM 99.5 Financial: stock price, inflation June 25, 93 25 4 C June 12, 93 12 MFST 80.0 80 0 Biomedical: blood pressure June 25, 93 3 C, E, G June 12, 93 INTC 77.0 : : : June 13, 93 IBM 98.0 Meteorological: precipitation : : : : : : : : : 33 University of Indonesia 34 University of Indonesia Time-Related Time Related Data Mining: Necessity Trend Analysis Need for time-related mining: Categories of Time-Series Movements large class of real data not being mined Long-term or trend movements (trend curve) analyzing patterns (descriptive) y gp ( p ) Freehand method forecasting events (predictive) Least-square method Moving-average method g g Time-series Time series analysis: Cyclic movements or cycle variations, e.g., business cycles description (generally mathematically) Seasonal movements or seasonal variations prediction of the component movements. i.e, almost identical patterns that a time series appears to follow pattern matching: transformation, abstraction, indexing, during corresponding months of successive years. g p g y probabilistic methods Irregular or random movements pattern discovery: sequence, episode, periodicity 35 University of Indonesia 36 University of Indonesia
  10. 10. Similarity Search in Time Series Analysis Time-Series Similar Time Series Analysis Normal database query finds exact match Similarity search finds data sequences that differ only slightly from the given query sequence Two categories of similarity queries Whole matching: find a sequence that is similar to the q query sequence y q Subsequence matching: find all pairs of similar sequences Typical Applications Financial market (stock data analysis) Market basket data analysis Scientific databases (power consumption analysis) Medical diagnosis (cardiogram analysis) 37 University of Indonesia 38 University of Indonesia Similar Time Series Analysis Sequential Pattern Mining Mining of frequently occurring patterns related to time or other sequences VanEck International Fund Fidelity Selective Precious Metal and Mineral Fund Sequential pattern mining usually concentrate on symbolic patterns Examples Renting “Star Wars”, then “Empire Strikes Back”, then g p “Return of the Jedi” in that order Collection of ordered events within an interval Applications Targeted marketing Customer retention C t t ti Weather prediction Two similar mutual funds in the different fund group 39 University of Indonesia 40 University of Indonesia
  11. 11. Text Databases and Sequential Pattern Mining Information Retrieval Text databases (document databases) Customer-sequence Customer sequence Large collections of documents from various sources: news Map Large Itemsets articles, research papers, books, digital libraries, e-mail CustId Video sequence Large Itemsets MappedID messages, and Web pages, library database, etc. 1 {(C), (H)} (C) 1 Data stored is usually semi-structured 2 {(AB), (C), (DFG)} (D) 2 Traditional information retrieval techniques become 3 {(CEG)} inadequate for the increasingly vast amounts of text data (G) 3 4 {(C), (DG), (H)} Information retrieval (DG) 4 5 {( )} {(H)} A field developed in parallel with database systems (H) 5 Information is organized into (a large number of) documents Sequential p q patterns with support > 0.25 pp Information retrieval problem: locating relevant documents {(C), (H)} based on user input, such as keywords or example {(C), (DG)} documents 41 University of Indonesia 42 University of Indonesia Information Retrieval Text Mining Typical IR systems Application f data i i A li i of d mining to nonstructured or less d l Online library catalogs structured text files. It entails the generation of Online document management systems meaningful numerical i di i f l i l indices f from the unstructured h d Information retrieval vs. database systems text and then processing these indices using various Some DB problems are not present in IR, e.g., update, data i i d t mining algorithms l ith transaction management, complex objects g , p j Text mining helps organizations: Find the “hidden” content of documents, including Some IR problems are not addressed well in DBMS, e.g., additional useful relationships unstructured documents, approximate search using documents Relate documents across previous unnoticed divisions keywords and relevance Group documents by common themes 43 University of Indonesia 44 University of Indonesia
  12. 12. Applications of Text Mining Text Mining Automatic detection of e-mail spam or phishing through analysis How t mine text H to i t t of the document content 1. Eliminate commonly used words (stop-words) Automatic processing of messages or e-mails to route a e mails 2. Replace words with their stems or roots (stemming ( message to the most appropriate party to process that message algorithms) Analysis of warranty claims, help desk calls/reports, and so on to y y , p p , 3. 3 Consider C id synonyms and phrases d h identify the most common problems and relevant responses 4. Calculate the weights of the remaining terms Analysis of related scientific publications in journals to create an automated summary view of a particular discipline Creation of a “relationship view” of a document collection Qualitative analysis of documents to detect deception 45 University of Indonesia 46 University of Indonesia Web Mining Web Mining The WWW is huge, widely distributed, global information Growing and changing very rapidly service center for Internet growth Information services: news, advertisements, consumer 40000000 35000000 information, information financial management, education, government, e- management education government e 30000000 25000000 commerce, etc. o ts Hs 20000000 15000000 Hyper-link information yp 10000000 5000000 Access and usage information 0 Sp 9 Sp 2 Sp 5 Sp 8 Sp 1 Sp 4 Sp 7 Sp 0 Sp 3 Sp 6 Sp 9 e -6 e -7 e -7 e -7 e -8 e -8 e -8 e -9 e -9 e -9 e -9 The discovery a a a ys s of interesting and useful e scove y and analysis o te est g a use u Broad diversity of user communities B d di it f iti information from the Web, about the Web, and usually Only a small portion of the information on the Web is truly relevant or through Web-based tools g useful WWW provides rich sources for data mining 99% of the Web information is useless to 99% of Web users How can we find high-quality Web pages on a specified topic? 47 University of Indonesia 48 University of Indonesia
  13. 13. Web Mining Mining the World-Wide Web g Web Mining Web W b content mining: the extraction of usefull information f t t i i h i f f i f i from W b Web pages Web Content Mining Web structure mining: the development of useful information from Web Structure Web U W b Usage the links included in the Web documents Web Page Content Mining Mining Mining Web usage mining: the extraction of useful information from the Web Page Summarization data being generated through webpage visits, transaction, etc. WebLog W bL (Lakshmanan et.al. 1996) ), WebOQL(Mendelzon et.al. 1998) …: Search Result General Access Customized Mining Pattern Tracking Usage Tracking Web Structuring query languages; Can id tif i f C identify information within given ti ithi i web pages •Ahoy! (Etzioni et.al. 1997):Uses heuristics to distinguish personal home pages from other web pages •ShopBot (Etzioni et.al. 1997): Looks for product prices within web pages 49 University of Indonesia 50 Mining the World-Wide Web g Mining the World-Wide Web g Web Mining b i i Web Mining Web Content Web Content Web Usage Web St t W b Structure Mining Web Structure Mining Mining Mining Web Usage Mining Using Links Mining Web Page •PageRank (Brin et al., 1998) Content Mining Search Result Mining •CLEVER (Chakrabarti et al., 1998) General Access Customized Search Result Use interconnections between web pages to give General Access Search Engine Result Pattern Tracking Usage Tracking Mining weight to pages. Pattern Tracking Summarization S i ti •Clustering Search Result (Leouski Web Page Using Generalization Customized and Croft, 1996, Zamir and Etzioni, Content Mining •MLDB (1994), VWV (1998) Usage Tracking 1997): Uses U a multi-level database representation of the lti l l d t b t ti f th Categorizes documents using Web. Counters (popularity) and link lists are used phrases in titles and snippets for capturing structure. 51 52
  14. 14. Mining the World-Wide Web g Mining the World-Wide Web g Web Mining Web Mining Web C t t W b Content Web St t W b Structure Web U W b Usage Mining Mining Mining Web Content Web Structure Web Usage Mining Mi i Mining Mi i Mining Mi i Web Page Customized Content Mining General Access Pattern Tracking Usage Tracking Web Page General Access Customized Usage Tracking Search Result •Web Log Mining (Zaïane, Xin and Han, 1998) Content Mining Pattern Tracking Mining Uses KDD techniques to understand general •Adaptive Sites (Perkowitz and Etzioni, 1997) access patterns and trends. Search Result Analyzes access patterns of each user at a time. Can h d light C shed li h on better structure and b d Mining Web site restructures itself automatically by grouping of resource providers. learning from user access patterns. 53 54 Web Mining Web Mining Uses for Web mining: Determine the lifetime value of clients Design cross-marketing strategies across products Evaluate promotional campaigns Target electronic ads and coupons at user groups Predict user behavior Present dynamic information to users 55 University of Indonesia 56
  15. 15. Summary Summary (2) Mining complex types of data include object data, spatial data, Time-series/sequential data mining includes trend analysis, multimedia data, time-series data, text data, and Web data similarity search in time series, mining sequential patterns and Object data can be mined by multi-dimensional generalization periodicity in time sequence of complex structured data, such as plan mining for flight Text mining goes beyond keyword-based and similarity-based sequences information retrieval and discovers knowledge from semi- Spatial data warehousing, OLAP and mining facilitates structured data using methods like keyword-based association multidimensional spatial analysis and finding spatial associations, and document classification classifications and trends Web mining includes mining Web link structures to identify Multimedia data mining needs content-based retrieval and authoritative Web pages, the automatic classification of Web similarity search integrated with mining methods documents, b ld a multilayered W b information base, and d building l l d Web f b d Weblog mining 57 University of Indonesia 58 University of Indonesia References (1) References (2) R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In Proc. 4th J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagraCQ: A scalable continuous query system for internet Int. C nf Foundations f Int Conf. F ndati ns of Data Organization and Algorithms, Chicago, Oct. 1993. Or ani ati n Al rithms Chica Oct 1993 databases. SIGMOD'00, Dallas, TX, May 2000. SIGMOD 00, R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, C. Chatfield. The Analysis of Time Series: An Introduction, 3rd ed. Chapman and Hall, 1984. and translation in time-series databases. VLDB'95, Zurich, Switzerland, Sept. 1995. S. Chakrabarti. Data mining for hypertex: A tutorial survey. SIGKDD Explorations, 1:1-11, 2000. G. G Arocena and A O Mendelzon. WebOQL : Restructuring documents, databases, and webs. A. O. Mendelzon documents databases webs S. Deerwester, S Dumais, G. Furnas, T. Landauer, S Deerwester S. Dumais G Furnas T Landauer and R Harshman Indexing by latent semantic analysis J. R. Harshman. analysis. J ICDE'98, Orlando, FL, Feb. 1998. American Society for Information Science, 41:391-407, 1990. R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying shapes of histories.VLDB'95, Zurich, M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander. Algorithms for claracterization and trend detection in Switzerland, Sept. 1995. spatial databases. KDD'98, New York, NY, Aug. 1998. R. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, Taipei, Taiwan, Mar. 1995. M.J. Egenhofer. Spatial Query Languages. UMI Research Press, University of Maine, Portland, Maine, S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW'98, 1989. Brisbane, Australia, 1998. M. Ester, H.-P. Kriegel, and J. Sander. Spatial data mining: A database approach. SSD'97, Berlin, Germany, C. Bettini, X. Sean Wang, and S. Jajodia. Mining temporal relationships with multiple granularities in July 1997. J l 1997 time sequences. Data Engineering Bulletin, 21:32-38, 1998. C. Faloutsos. Access methods for text. ACM Comput. Surv., 17:49-74, 1985. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. U. M. Fayyad, S. G. Djorgovski, and N. Weir. Automating the analysis and cataloging of sky surveys. In S. Chakrabarti, B. E. Dom, and P Indyk. Enhanced hypertext classification using hyper-links. S Ch k b i B E D d P. I d k E h dh l ifi i i h li k U.M. Fayyad G Piatetsky-Shapiro P Smyth U M Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R Uthurusamy editors Advances in Knowledge R. Uthurusamy, editors, SIGMOD'98, Seattle, WA, June 1998. Discovery and Data Mining, AAAI/MIT Press, 1996. S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. M. R. Feldman and H. Hirsh. Finding associations in collectionds of text. In R. S. Michalski, I. Bratko, and M. Kleinberg. Mining the web s link structure. COMPUTER, 32:60 67, 1999. web's 32:60-67, Kubat, editors, "Machine Learning and Data Mining: Methods and Applications", John Wiley Sons, 1998. 59 University of Indonesia 60 University of Indonesia
  16. 16. References (3) References (4) C. Faloutsos and K.-I. Lin. FastMap: A fast algorithm for indexing, data-mining and visualization of J J. Han, S. Nishio, H. Kawano, and W. Wang. Generalization-based data mining in object-oriented g g j traditional traditi nal and m ltimedia datasets SIGMOD'95 San Jose, CA, May 1995. multimedia datasets. SIGMOD 95, J se CA Ma 1995 databases using an object-cube model. Data and Knowledge Engineering, 25:55-97, 1998. D. Florescu, A.Y. Levy, and A. O. Mendelzon. Database techniques for the world-wide web: A survey. SIGMOD Record, 27:59-74, 1998. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: Frequent pattern-projected sequential pattern mining. KDD'00, Boston, MA, Aug. 2000. U. M. Fayyad, G Piatetsky-Shapiro P Smyth U M Fayyad G. Piatetsky-Shapiro, P. Smyth, and R Uthurusamy (eds.). Advances in Knowledge R. (eds ) Discovery and Data Mining. AAAI/MIT Press, 1996. J. Han, N. Stefanovic, and K. Koperski. Selective materialization: An efficient method for spatial data C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series cube construction. PAKDD'98. Melbourne, Australia, Apr. 1998. databases. SIGMOD'94, Minneapolis, Minnesota, May 1994. J. Han, Q.Yang, and E. Kim. Plan mining by divide-and-conquer. DMKD'99, Philadelphia, PA, May 1999. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, B. Dom, Q. Huang, M. Gorkani, J. Hafner, D. Lee, D. K. Koperski and J. Han. Discovery of spatial association rules in geographic information databases. Petkovic, S. Steele, and P.Yanker. Query by image and video content: The QBIC system. IEEE SSD'95, Portland, Maine, Aug. 1995. Computer, 28:23-32, 1995. J J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of ACM, 46:604-632, 1999. g yp J , , S. Guha, R. Rastogi, d K Shi R k S G h R R t i and K. Shim. Rock: A robust clustering algorithm for categorical attributes. b t l t i l ith f t i l tt ib t ICDE'99, Sydney, Australia, Mar. 1999. E. Knorr and R. Ng. Finding aggregate proximity relationships and commonalities in spatial data mining. R. H. Gueting. An introduction to spatial database systems. The VLDB Journal, 3:357-400, 1994. IEEE Trans. Knowledge and Data Engineering, 8:884-897, 1996. J. Han G Dong, J Han, G. Dong and Y. Yin. Efficient mining of partial periodic patterns in time series database Y Yin database. J. M. Kleinberg and A. Tomkins. Application of linear algebra in information retrieval and hypertext ICDE'99, Sydney, Australia, Apr. 1999. analysis. PODS'99. Philadelphia, PA, May 1999. J. Han, K. Koperski, and N. Stefanovic. GeoMiner: A system prototype for spatial data mining. H. Lu, J. Han, and L. Feng. Stock movement and n-dimensional inter-transaction association rules. SIGMOD'97, Tucson, Arizona, May 1997. DMKD'98, Seattle, WA, June 1998. J 61 University of Indonesia 62 University of Indonesia References (5) References (6) W. Lu, J. Han, and B. C. Ooi. Knowledge discovery in large spatial databases. In Proc. Far East D. Rafiei and A. Mendelzon. Similarity-based queries for time series data. SIGMOD'97, Tucson, Workshop Geographic Information Systems, Singapore, June 1993. Systems Singapore 1993 Arizona, Arizona May 1997. 1997 D. J. Maguire, M. Goodchild, and D. W. Rhind. Geographical Information Systems: Principles and G. Salton. Automatic Text Processing. Addison-Wesley, 1989. Applications. Longman, London, 1992. J. Srivastava, R. Cooley, M. Deshpande, and P. N. Tan. Web usage mining: Discovery and applications of H. H Miller and J. Han Geographic Data Minin and Knowledge Discovery. Taylor and Francis, 2000. J Han. Ge ra hic Mining Kn led e Disc er Ta l r Francis 2000 usage patterns f from web d b data. SIGKDD Explorations, 1:12-23, 2000. E l i 1 12 23 2000 A. O. Mendelzon, G. A. Mihaila, and T. Milo. Querying the world-wide web. Int. Journal of Digital P. Stolorz and C. Dean. Quakefinder: A scalable data mining system for detecting earthquakes from Libraries, 1:54-67, 1997. space. KDD'96, Portland, Oregon, Aug. 1996. H. Mannila, T i H M il H Toivonen, and A. I.Verkamo. Discovery of f dA I V k Di f frequent episodes in event sequences. Data i d i D G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983. Mining and Knowledge Discovery, 1:259-289, 1997. V. S. Subrahmanian. Principles of Multimedia Database Systems. Morgan Kaufmann, 1998. A. Natsev, R. Rastogi, and K. Shim. Walrus: A similarity retrieval algorithm for image databases. SIGMOD 99, Philadelphia, PA SIGMOD'99 Philadelphia PA, June 1999. 1999 C. J C J. van Rijsbergen. Information Retrieval. Butterworth, 1990. Rijsbergen Retrieval Butterworth 1990 B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, Orlando, FL, Feb. K. Wang, S. Zhou, and S. C. Liew. Building hierarchical classifiers using class proximity.VLDB'99, 1998. Edinburgh, UK, Sept. 1999. M. Perkowitz d O. Etzioni. Adaptive b it Conceptual cluster mining. IJCAI'99 St kh l M P k it and O Et i i Ad ti web sites: C t l l t i i IJCAI'99, Stockholm, B.-K.Yi, H.V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time Sweden, 1999. warping. ICDE'98, Orlando, FL, Feb. 1998. P. Raghavan. Information retrieval algorithms: A survey. In Proc. 1997 ACM-SIAM Symp. Discrete C. T.Yu and W. Meng. Principles of Database Query Processing for Advanced Applications. Morgan Algorithms, Algorithms New Orleans Louisiana, 1997. Orleans, Louisiana 1997 Kaufmann, 1997. 63 University of Indonesia 64 University of Indonesia
  17. 17. References (7) Some R f S References on Spatial Data Mining S ti l D t Mi i B.-K.Yi, N. Sidiropoulos, T. Johnson, H.V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for co-evolving time sequences. ICDE'00, San Diego, CA, Feb. 2000. g q , g , , C. Zaniolo, S. Ceri, C. Faloutsos, R. T. Snodgrass, C. S. Subrahmanian, and R. Zicari. Advanced H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Database Systems. Morgan Kaufmann, 1997. Discovery,Taylor and Francis, 2001. O. R. Za"iane and J. Han. Resource and knowledge discovery in global information systems: A Za iane Ester M., Frommelt A., Kriegel H.-P., Sander J : Spatial Data Mining: M A H P J.: preliminary design and experiment. KDD'95, Montreal, Canada, Aug. 1995. Database Primitives, Algorithms and Efficient DBMS Support, Data Mining O. R. Za"iane and J. Han. WebML : Querying the world-wide web for resources and knowledge. and Knowledge Discovery, an International Journal. 4, 2000, pp. 193-216. WIDM'98, Bethesda, Maryland, Nov. 1998. y O. R. Za"iane, J. Han, Z. N. Li, J.Y. Chiang, and S. Chee. MultiMedia-Miner: A system prototype for J. Han, M. Kamber, and A. K. H. Tung, "Spatial Clustering Methods in Data multimedia data mining. SIGMOD'98, Seattle, WA, June 1998. Mining: A Survey", in H. Miller and J. Han (eds.), Geographic Data Mining O. R. Za"iane, J Han, and H. Zhu. Mining recurrent items in multimedia with p g , J. , g progressive resolution and Knowledge Discovery, Taylor and Francis, 2000. g y y refinement. ICDE'00, San Diego, CA, Feb. 2000. Y. Bedard, T. Merrett, and J. Han, "Fundamentals of Geospatial Data M. J. Zaki, N. Lesh, and M. Ogihara. PLANMINE: Sequence mining for plan failures. KDD'98, New Warehousing for Geographic Knowledge Discovery", in H. Miller and J. York, NY, Aug. 1998. Han (eds ) Geographic Data Mining and Knowledge Discovery Taylor and (eds.), Discovery, X. Zhou, D. Truffet, and J. Han. Efficient polygon amalgamation methods for spatial OLAP and spatial data mining. SSD'99. Hong Kong, July 1999. Francis, 2000 O. R. Za"iane, M. Xin, and J. Han. Discovering Webaccess patterns and trends by applying OLAP and J g p y pp y g data mining technology on Web logs. ADL'98, Santa Barbara, CA, Apr. 1998. 65 University of Indonesia 66 University of Indonesia References on Text Mining (1) References on Text Mining (2) G. Arocena and A. O. Mendelzon. WebOQL : Restructuring documents, databases, and C. C J. van Rijsbergen. Information Retrieval. Butterworth, 1990. f 990 webs. ICDE'98, Orlando, FL, Feb. 1998. G. Salton. Automatic Text Processing. Addison-Wesley, 1989. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. y, G. G Salton and M. McGill Introduction to Modern Information Retrieval. McGraw Hill, 1983 M McGill. Retrieval McGraw-Hill, 1983. S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys. semantic analysis. J. American Society for Information Science, 41:391-407, 1990. Accepted for publication, 2002. C. Faloutsos. A C F l t Access methods f text. ACM Comput. Surv., 17:49-74, 1985 th d for t t C tS 17 49 74 1985. K. Wang, S. Zhou, and S. C. Liew. Building hierarchical classifiers using class proximity. R. Feldman and H. Hirsh. Finding associations in collectionds of text. In R. S. Michalski, I. VLDB'99, Edinburgh, UK, Sept. 1999. Bratko, and M. Kubat, editors, "Machine Learning and Data Mining: Methods and Applications", J h W l S A l " John Wiley Sons, 1998 1998. Y. Y Yang and X. Liu. A re-examination of text categorization methods Proceedings of ACM X Liu methods. SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99, pp 42- J. M. Kleinberg and A. Tomkins. Application of linear algebra in information retrieval and -49), 1999. hypertext analysis. PODS'99. Philadelphia, PA, May 1999. P. Raghavan. Information retrieval algorithms: A survey. In Proc. 1997 ACM-SIAM Symp. Y.Yang. An evaluation of statistical approaches to text categorization. Journal of Information Discrete Algorithms, New Orleans, Louisiana, 1997. Retrieval, Vol 1, No. 1/2, pp 67--88, 1999. 67 University of Indonesia 68 University of Indonesia