The Second International Conference on Knowledge Discovery ...
Upcoming SlideShare
Loading in...5

The Second International Conference on Knowledge Discovery ...






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

The Second International Conference on Knowledge Discovery ... The Second International Conference on Knowledge Discovery ... Document Transcript

  • KDD–96 The Second International Conference on Knowledge Discovery and Data Mining—KDD-96 Product Demonstrations Ac2: Advanced Decision DataMine—An Integrated Tree-based Data Mining Geomarketing Decision Cyril Way, Hugues Marty, Thierry Marie Victoire ISoft Support System Cyril Way, Hugues Marty, Thierry Marie Ac2 is an interactive decision tree based data mining develop- Victoire, ISoft ment environment with advanced knowledge modeling capabil- ities. It includes an object oriented graphical language for struc- Geomarketing decision support system in a banking organiza- turing raw data and guiding the data mining process with tion. CAISSE D’EPARGNE, one of the top three banks in France, op- domain knowledge. Data being structured or not, the tool erates a 3,000 automated teller machines network. These ATMs builds decision trees automatically. Users may freely interact may generate profits, or losses: the better the location, the high- with information displayed in the trees, by modifying tests, cut- er the return on investment and the lower the risk. The applica- ting or expanding tree branches, grouping or ungrouping tree tion built by ISoft is a global data mining application, that aims nodes. Reporting features include graphs and text reports. Vari- at providing the bank’s managers with forecasts of new ATMs ous ID3 and CART based algorithms are available within the profitability, in order to help them decide whether the machine tool. should be set up or not. The development is made of several Unique features: OO graphical language for knowledge mod- steps: eling. No software limitations on data volumes to be processed. Knowledge modeling: Which parameters may explain an Fully interactive development tool and C++ libraries. ATM’s behavior? Data collection and preparation: gathering data Status: Commercial product. and controlling their integrity. Data mining: Extracting models from raw data, testing and Clementine Data Mining System validating them. Colin Shearer Data Mining Division, Integral Solutions Ltd. Implementing these models: making them available to every manager through a distributed decision support application. Launched in 1994, Clementine is a comprehensive end-user Unique features: Auto improvement of the prediction accura- toolkit for data mining. It integrates multiple modeling tech- cy rate. Little obsolescence risk. nologies—neural networks, rule induc- tion, and statistical re- Status: Fielded application. gression—with interactive visualization modules, database/file access and numerous data pre-processing/manipulation facili- Data Surveyor ties. Clementine is driven using a sophisticated visual program- M.Holsheimer, F.Kwakkel, D.Kwakkel, P. Boncs. Data Distilleries, Kruislaan ming interface; the user selects icons representing data sources 419, 1098 VA Amsterdam, The Netherlands. and operations, and positions and connects these to specify data flows. The modeling modules are by default automatically con- Data Surveyor is a data mining tool that has been developed by figured, based on the data presented. Expert options allow fine Data Distilleries in close co-operation with CWI in the Nether- control of the algorithms, and an intermediate level allows the lands and other members of the EC-funded KESO (knowledge user to specify high-level strategies or preferences. Models built engineering for statistical offices) research project. We demon- within a Clementine session can be exported as C source code strate the analytical, graphical and reporting capabilities of Data for embedding within other applications. Clementine is in use Surveyor by the interactive analysis of a number of datasets, in numerous areas, with applications in finance, retail, energy, constructed on the basis of our experience with clients. Example science/pharmaceuticals, marketing, manufacturing, govern- applications include the discovery of risk profiles in an insur- ment, telecommunications and defense. ance data base, and in a credit granting data base. We demon- Unique features: Integration of many algorithms/facilities. Ac- strate the possibility to mine interactively on large databases, al- cessibility to non-technologist end-users through visual pro- lowing the user to further explore interesting elements of the gramming and automatic configuration. discovered knowledge. Status: Commercial product. Unique features: Data Surveyor allows the user to mine inter- actively on large data bases, by using efficient search strategies and database optimization techniques. Profiles generated from the database are easily understood by the user, thus enhancing the interactive and creative character of the data mining exer- cise. Visualization techniques are used to further enhance the user’s insight. Status: Commercial product.
  • DBMiner: A System for Mining D-SIDE: A Probabilistic Decision where background knowledge is used to Knowledge in Large Relational Endorsement Environment specify constraints on the desired results Databases of this query process. Execution of a P.Kontkanen, P.Myllym?ki and H.Tirri: Complex Systems Computation Group Department of Com- knowledge-discovery query is structured Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Wan Gong, Krzysztof Koperski, Deyi Li, Yijun Lu, puter Science University of Helsinki, Finland so that these background-knowledge con- Amynmohamed Rajan, Nebojsa Stefanovic, Betty straints can be exploited in the search for Xia, Osmar R. Zaiane.School of Computing Science D-SIDE (a probabilistic decision endors- possible results. Finally, rather than re- Simon Fraser University, British Columbia, Cana- ement environment) is a generic tool for quiring a user to specify an explicit query da V5A 1S6 probabilistic inference and model con- expression in the knowledge-discovery A data mining system, DBMiner, has been struction. The computational model is query language, FACT presents the user developed for interactive mining of mul- based on finite mixture distributions, with a simple-to-use graphical interface to tiple-level knowledge in large relational which are constructed by clustering data the query language, with the language databases. The system implements a wide into groups of similar elements. The D- providing a well-defined semantics for the SIDE software is a platform independent discovery actions performed by a user spectrum of data mining functions, in- cluding generalization, characterization, prototype with a graphical JAVA user inter- through the interface. association, classification, and prediction. face, which allows the software to be used Status: Research prototype. By incorporation of several interesting across the Internet. D-SIDE offers an ele- gant solution for both prediction and da- IBM Data Mining Tools data mining techniques, including at- tribute-oriented induction, statistical ta mining problems. For prediction pur- Julio Ortega, Kamal Ali, Stefanos Manganaris, analysis, progressive deepening for min- poses, D-SIDE provides a computationally George John, IBM Almaden Research Center ing multiple-level knowledge, and meta- efficient scheme for computing Bayes op- timal predictive distributions for any set IBM has developed several powerful algo- rule guided mining, the system provides a rithms and processing techniques that en- user-friendly, interactive data mining en- of attributes. In data mining applications, the user can explore attribute dependen- able application developers to analyze da- vironment with good performance. ta stored in databases or flat files. These Unique features: It implements a wide cies by visualizing the cluster structure found in data. In empirical tests with var- algorithms enable analyses ranging from spectrum of data mining functions includ- classification and predictive modeling, to ing generalization, characterization, associ- ious public domain classification datasets, D-SIDE consistently outperforms the re- association discovery, and database seg- ation, classification, and prediction. It per- mentation. Using predictive modeling, a forms interactive data mining at multiple sults obtained by alternative approaches, such as decision trees and neural net- retailer could forecast changes in cus- concept levels on any user-specified set of tomer buying patterns or characterize dif- data in a database using an SQL-like data works. A running demo of the D - SIDE software is available for experimentation ferent types of customers, for example, mining query language, DMQL, or a those that are likely to switch from in- graphical user interface, and visual presen- through our WWW homepage at URL http:// www.cs.Helsinki.FI/research/cosco. store buying to internet or mail-order tation of rules, charts, curves, etc. Both buying. Through association discovery, a UNIX and PC (Windows/NT) versions of the Unique features: The system combines the computationally efficient, yet expres- supermarket chain could determine which system adopt a client/server architecture. products are most frequently sold in con- Status: Research prototype. sive set of finite mixture models with a theoretically solid Bayesian approach for junction with other products. More de- Decisionhouse model construction and prediction. The tails about can be found at interface is based on the platform inde- com/stss/. Quadstone Ltd. Unique features: The IBM Data Mining pendent JAVA language, which allows the software to be used across the Internet. tools provide access to a comprehensive Decisionhouse is an integrated software Status: Research prototype. set of data manipulation functions and suite for scalable data analysis and visual- machine learning algorithms, thus allow- ization. It offers integration with the ma- jor RDBMS packages, fast segmentation FACT : Finding Associations in ing developers and consultants to quickly and profiling, geodemographic analysis, Collections of Text build customized data mining appli- Ronen Feldman, Bar-Ilan University, and Haym cations. IBM’s Data Mining tools can run pre- and post-processing and interactive Hirsh, Rutgers University. on small workstations but is they are 3-dimensional visualization. Although highly scalable since the algorithms have the functionality provided is generic, De- FACT (finding associations in collections of parallel implementations optimized to cisionhouse is currently principally tar- text) is a system for knowledge discovery handle large and parallel databases. geted as a tool for analyzing large cus- from text. It discovers associations— pat- Status: Commercial product.s. tomer databases. terns of co-occurrence—among keywords Unique features: Parallelism, speed, in- Kepler: Extensibility in Data Min- labeling the items in a collection of textual tegration. ing Systems documents. In addition, when back- Status: Commercial product. ground knowledge is available about the Stefan Wrobel, Dietrich Wettschereck, Edgar Som- keywords labeling the documents FACT is mer, able to use this information in its discov- Werner Emde GMD, FIT.KI, Schloss Birlinghoven, 53754 ery process FACT takes a query-centered Sankt Augustin, Germany view of knowledge discovery, in which a discovery request is viewed as a query over Given the variety of tasks and goals of da- the implicit set of possible results support- ta mining applications, there can never be ed by a collection of documents, and a fixed arsenal of analysis methods suffi- 2 KDD DEMO DESCRIPTIONS
  • cient for all applications. Many applica- code a few quick applications and run MineSet tions even require custom methods to be them as part of the demo. Steven Reiss and Mario Schkolnick, Data Mining used. Systems with a fixed selection of Unique features: It allows an ad-hoc and Visualization Group, Silicon Graphics Com- methods, or maybe just one single meth- query based approach to mining, where puter Systems od, are thus likely to be outgrown quickly, the query can be used both for rule selec- forcing the analyst to switch to a different tion, and rule generation. It demonstrates MineSet is a suite of tools that allows the system, incurring time-consuming train- some higher level applications that can be user to access data from data warehouses, ing and data conversion overhead. In our developed when we treat rules as first transform the data, apply data mining al- demonstration, we show Kepler, an inte- class objects. It provides a C++ API, and a gorithms to the data, and then use visual- grated data mining system extensible programming environment, which allows ization to perform visual data mining. It through a “plug-in” facility. The plug-in users/developers to build complex kdd- supports direct access to commercial facility allows algorithms for existing or applications. RDBMS databases. MineSet provides the new analysis tasks to be rapidly plugged Status: Research prototype. user with ability to transform data into the system without redeveloping the through the use of binning, aggregations, system core. Management Discovery Tool and groupings. MineSet supports the Already available are plug-in algo- Ken O’Flaherty, NCR Corporation , Parallel Sys- generation of both classification (deci- rithms for decision tree induction, back- tems Division sion tree and naive Bayes) and associa- prop neural networks, subgroup discov- tion rule models directly from the data. ery, clustering, nearest neighbor learning, The NCR Management Discovery Tool Four visualization tools are provided and multiple adaptive regression splines. (MDT) is a new class of business report- with MineSet. These visual data mining We illustrate the system in the context of ing tool which applies intelligence not tools enable analysts to interactively ex- our applications on retail data analysis present in today’s tools to provide action- plore data and quickly discover meaning- and ecological modeling. able information for managers and other ful new patterns, trends, and relation- Unique features: Kepler is easily extensi- users of the data warehouse. MDT com- ships. The tools support both spatial and ble with new methods, new analysis tasks, bines three advanced technologies to cre- non-spatial trend analysis through ani- new preprocessing operators through its ate management reports: business knowl- mation, discovery of clustering and pro- plug-in facility and already offers a wide edge representation, dynamic user filing, analysis of hierarchical data using variety of tasks and methods. interfaces and database query generation. 3D flythrough, and visualization and Status: Research prototype. These reports summarize situations, analysis of data mining results. The Mine- trends and events occurring in your busi- Set Tool Manager serves as the command Knowledge Discovery ness. Working from customized business console for the MineSet user. Using the Environment rules defined by your company’s knowl- Tool Manager, one can specify what data edge workers, MDT highlights and ex- to access, how to transform the data, how Aashu Virmani, Amin Abdulghani and Tomasz Imielinski).Department of Computer Science, Rut- plains changes and trends for your key to mine the data, and how to visualize the gers University business indicators. MDT also monitors results of the data transformation and selected business indicators for deviations mining process. We will demonstrate the process of query- from plan, and can alert you to exceptions Unique features: MineSet is unique in ing a rulebase, (rule-mining) and then us- as they occur, with automatic analysis of its integration of data mining and visual- ing the same query to generate that set of the causes driving the exceptions. Infor- ization. Visualization serves two purpos- rules from the database (data-mining). mation is generated and presented in the es. It provides the user with the ability to The process of “mining around a rule,” form of InfoFrames. These presentation- do visual data mining, allowing discovery where the answer from a query can be re- quality compound documents consist of of trends, patterns, and anomalies con- submitted as a query to explore deeper in- natural language reports and explanatory tained in vast amounts of data. MineSet to the rule, will also be demonstrated. graphics that are dynamically generated also allows for better understanding, anal- Next, we will show some sample applica- by MDT. ysis, and exploration of classification and tions like “typicality,” “distinguishing Unique features: MDT is unique in its association models resulting from the da- characteristics of ” and “best classifiers ability to automatically retrieve relevant ta mining process. for” which are built on top of rules, prov- information of importance to one’s busi- Status: Commercial product. ing that rule-generation is not the final ness, and incorporate it dynamically into step in data-mining. We will then show concise high-quality management re- MM — Mining with Maps the “exploratory mining scenario,” in ports. This information is in three cate- Raymond Ng Computer Science, University of which the user can query the rulebase as gories: drill-across—to surrounding seg- British Columbia it is being generated, narrow his prefer- ments of interest, drill-down—to selected ences about what he wants, and prune the segments that make up the target segment This package features three spatial data overall rule-space the system must gener- and causal analysis—indicating probable mining algorithms. The first algorithm ate. Finally, we shall try to unify all the causes of the business situation. computes aggregate proximity relation- above concepts in a programming envi- Status: MDT is currently in develop- ships. The second algorithm discovers ronment, which makes use of a C++ API, ment, with a planned release in the fourth characteristic and discriminating proper- and allows KDD-developers and users quarter of 1996. NCR will demonstrate a ties of spatial clusters from thematic alike, to embed rule generation in larger prototype version. maps. The third algorithm performs applications, and also provides support boundary shape matching in spatial data. for defining new attributes and incorpo- The package also includes visualization rate them in the mining queries. We will tools. More details of these algorithms KDD DEMO DESCRIPTIONS 3
  • can be found in “Finding Aggregate Prox- ties demanding new and more efficient queries Inspec to find the institutional af- imity Relationships and Commonalties in data processing methods. This calls for filiation of the principal author of the pa- Spatial Data Mining” to appear in: IEEE improved tools which, with the aid of da- per. Then WebFind uses Netfind to pro- Transactions on Knowledge and Data En- ta mining methods, are capable of identi- vide the Internet address of a host gineering, Special Issue on Data Mining; fying novel, potentially useful, and ulti- computer with the same institutional af- “Extraction of Spatial Proximity Patterns mately understandable patterns in filiation. WebFind then uses a search al- by Concept Generalization” to appear in existing data. Such a knowledge discovery gorithm to discover a worldwide web the KDD-96 Proceedings; and “Spatial Da- system has now been developed under the server on this host, then an author’s home ta Mining: Discovering Knowledge of name of STARC (statistics, analysis, recog- page, and finally the location of the want- Clusters from Maps” to appear in: Pro- nition, clustering) by the Russian compa- ed paper. Since institutions are designated ceedings, 1996 SIGMOD Workshop on Re- ny DATA-CENTER headed by D. Gainanow. very differently in Inspec and Netfind, it search Issues on Data Mining and Knowl- STARC offers the user a wide variety of is nontrivial to decide when an Inspec in- edge Discovery. modules which can be combined in nu- stitution corresponds to a Netfind institu- Unique features: All three spatial data merous ways to create a powerful data tion. WebFind uses a recursive field mining algorithms are unique. The pack- processing tool. Different modules pro- matching algorithm to do this. age is, therefore, the first implementation vide different methods of statistics, super- Unique features: Automatic discovery of of the three algorithms. vised and unsupervised machine learning, information on the worldwide web is Status: Research prototype. machine discovery, data compression and achieved in real-time. Unlike typical data visualization. search engines, WebFind does not use a Optimization Related Data Min- Unique features: Most existing decision central index of the worldwide web. It is ing Using the PATTERN System tree methods are based on the minimiza- able to do this through mining of infor- R. L. Grossman, H. Bodek, D. Northcutt Magnify, tion of the number of misclassified vec- mation from other sources. At the heart Inc.and H. V. Poor, tors in the feature space. In this context, of this mining is the field matching algo- Princeton University there is the problem of getting stuck in rithm, which identifies semantically only local minima. To overcome this equivalent information in multiple We will demonstrate a system which inte- shortcoming an outstanding novel alge- sources. grates an object warehouse with data braic approach is applied in the system Status: Research Prototype accessible mining and data analysis tools. The sys- STARC. through the WWW at: http://dino.ucsd. tem is designed to work with large vol- Status: Commercial product. edu:8000/WebFind.html umes of data. We have developed an ob- ject warehouse which is specifically WebFind: Mining External WEBSOM—Interactive designed for the types of queries and Sources to Exploration of Document analyses arising in data mining. Because Guide WWW Discovery Collections of the amount of data analyzed, rather than search for any patterns, the system Alvaro E. Monge and Charles P. Elkan Department Krista Lagus, Timo Honkela, Samuel of Kaski and Teuvo Kohonen, Neural Networks Re- narrows the search to patterns of interest Computer Science and Engineering University of search Centre of Helsinki University of Technology in optimization problems. We will dem- California, onstrate the system on several applica- San Diego La Jolla, CA 92093-0114 WEBSOM is a new approach to exploring amonge, tions, including anomaly detection, min- collections of full-text documents. A map ing textural data, and credit card account WebFind is an application that discovers of a document collection where similar management. scientific papers made available by their documents are found close to each other Unique features: We integrated an ob- authors on the worldwide web. WebFind can be explored with an easy interface. ject warehouse with data mining tools. mines external sources for information The demonstration is also accessible to Using an object warehouse and special- that aids in the discovery process. Cur- anyone in the address http://websom. ized algorithms we can mine large data rently there are two information sources sets. We have developed and implemented being mined: Melvyl and Netfind. Melvyl Unique features: Visualization of the distributed and parallel versions of tree- is a University of California library service document space with a map where related based data mining algorithms. We have that includes a comprehensive database of documents appear close to each other. developed specialized algorithms for find- bibliographic records. Netfind is a white The method by which such a meaningful ing patterns related optimization. pages service that gives internet host ad- map can be produced. Status: Commercial product, currently dresses and email address. Separately Status: Research prototype. available in beta with several customers these services do not provide enough in- STARC formation to locate papers on the world- wide web. WebFind integrates the infor- Damir Gainanow, Andre Matweew, Michael Thess mation provided by each in order to DATA-Center Ltd., Ekaterinburg, RUSSIA Scholz & Thess Software GbR, Chemnitz, GERMANY discovery on the worldwide web the in- formation actually wanted by a user. A Rapid developments in the field of science WebFind search starts with the user pro- and technology in recent years have creat- viding keywords identify the paper, exact- ed an enormous amount of electronic da- ly as he or she would in searching Inspec ta. The accompanying improvements in directly. After the user confirms that the the generation, transport and storage of right paper has been identified, WebFind this data have opened up vast opportuni- 4 KDD DEMO DESCRIPTIONS