• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Handouts - Information Technology Laboratory at University of ...
 

Handouts - Information Technology Laboratory at University of ...

on

  • 519 views

 

Statistics

Views

Total Views
519
Views on SlideShare
519
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Handouts - Information Technology Laboratory at University of ... Handouts - Information Technology Laboratory at University of ... Document Transcript

    • Sharma Chakrvarthy The Information Technology Laboratory or ITLab (part of the CSE Department at UTA) is actively involved in research and development on all aspects of information technology: modeling, retrieval, management, optimization, scalability and performance evaluation. The group consists of Profs. Sharma Chakravarthy, Ramez Elmasri, and Leonidas Fegaras whose complementary expertise allows them to work independently as well as jointly on a broad set of topics ranging from -- modeling to systems development to performance aspects. Some of the topics currently being worked on are: Data Warehousing/information integration, Data Mining/Knowledge Discovery, Web Databases, E-commerce, Active/push technology for large network-centric Information Management, and Object-Oriented, Temporal & Heterogeneous databases. The vision of the group is to carry out both fundamental and practically applicable research and development to enable the use of information technology for diverse needs. The group views interaction and collaboration with industry as critical for identifying fundamental problems in the areas of databases and information technology usage and functionality. Our aim is to identify problems based on the current and future needs and develop formal solutions as well as implement proof- of-principle systems. This approach provides a viable migration path for new techniques/solutions to be integrated into real-life systems/applications. We are seeking industry collaborators and project partners to pursue solutions to problems that are of interest to IT industry. We are also interested in transferring technology currently available at ITLab to real-life applications. If you would like to know more about the ITLab and collaboration with us, please contact me at sharma@cse.uta.edu. Please look around and enjoy! Sincerely, Sharma Chakravarthy, Professor Information Technology laboratory and Computer Science and Engineering Department The University of Texas at Arlington Information Technology Laboratory @ UT Arlington Page 1 of 5 Sept 7, 2001
    • Sharma Chakrvarthy Research and Development Synopsis: Research and Development contributions of Sharma Chakravarthy has spanned over 2 decades starting with his work in image processing and graphics at the Tata Institute of Fundamental Research (TIFR), Computer Science Group, Bombay, India to his work in advanced databases (active/real-time, distributed, and heterogeneous) at the University of Florida (UFL), Gainesville, and currently at the University of Texas @ Arlington (UTA). A number of prototypes have been developed over the years (Sentinel consisting of C++ and Java versions of local and global event detectors,, ECA agents for Sybase, Oracle, and DB2, Layered mining optimizer and visualization tools, to name a few). A chronological summary of my research activities over the past 2 decades is outlined below: • Currently, at UTA (University of Texas @ Arlington), my research activity is concentrated in the areas of distributed active capability, data mining (both graph-based data mining and database mining), data warehousing, and push technology for large network-centric environments. In addition, ECA agents for various relational databases are being designed and developed. • I have established an IT (Information Technology) Laboratory to broaden the scope of research activities to include web, data warehousing, and related topics • At UFL (University of Florida at Gainesville), I, not only consolidated my work on active databases (the Sentinel project, started as HiPAC at CCA), but also broadened the scope of my research to include temporal, real-time, distributed databases, and other topics such as expert system for neuro-oncology problems, and virtual “clean-room”. • At CCA (Computer Corporation of America, Cambridge, MA)) which became XAIT (Xerox Advanced Information Technology, Cambridge, MA), I worked on various federally funded projects, such as PROBE (object-oriented DBMS with support for abstract data types, spatial queries, and transitive closure computations), HiPAC (High Performance Active Database, pre-curser to Sentinel), VS (Visualization of Software), and VSDS (Verifiable and Secure Distributed database System). • At UMD (University of Maryland, College Park), I worked on PRISM (Parallel Logic Programming System), DAVID (Distributed Heterogeneous Database System), and Semantic query optimization for my thesis. • Before joining UMD, I worked at TIFR (Tata Institute of Fundamental research, Bombay, India) in the areas of image processing, SIGN -- a graphics package that is operating system-, language-, and environment -independent. Recently, Diane Cook and I organized the NSF/IDM 2001 workshop at Fort Worth. I have served as a member of program committee for a number of International Conferences and Workshops. I have also served as Co-Program Chair for workshops and conferences. I review papers for a number of journals in the general area of databases. Finally, I have given a number of keynote addresses, and tutorials in active, real-time, federated, and distributed databases in Europe, North America, and Asia. Information Technology Laboratory @ UT Arlington Page 2 of 5 Sept 7, 2001
    • Sharma Chakrvarthy Active Databases: Currently, R & D in the area of active databases/technology is along the following directions: 1. Design and implementation of a subscription-based Global event detector in Java that has persistence, recovery, dynamic insertion and deletion of ECA rules, visualization etc. 2. Design and implementation of ECA agents for various DBMSs (e.g., Sybase, Oracle, and DB2). This provides “value-added” active capability without modifications to the underlying systems using JDBC and an ECA agent. 3. Development of a monitor for multiple relational DBMSs along with file-based systems. 4. Stand-alone C++ Local and global event detectors that can be used for monitoring in the C++ application environment. 5. A distributed alert server based on the subscribe/publish paradigm. Our work on active databases started with the HiPAC project at CCA/XAIT and continued with the Sentinel project at UFL and now as WebVigil and other extensions at UTA. Currently, we have an expressive event specification language Snoop, a seamless design for incorporating event-condition-action (ECA) rules into an object-oriented framework, and the implementation of Sentinel -- an active Object-oriented DBMS using Open OODB from Texas Instruments, Dallas. Sentinel is perhaps the first effort that has addressed system and implementation issues for incorporating active capability into an object-oriented system. Complex event detection and nested transactions (as a 2 level transaction management -- by Exodus for top level transactions and by Sentinel for nested transactions without recovery) for executing rules concurrently have been implemented and integrated into Open OODB. We have also investigated optimization of ECA rules using incremental updates. The use of active capability heavily depends on tools and methodology for formulating and testing large ECA rules. For this both static analysis tools and run-time visualization tools are critical. A high-level rule visualization/explanation tool has been developed for understanding ECA rule interaction with transactions. We have also investigated the use of active capability in a broader scope. We have shown how active capability can be exploited at the system level to support advanced transaction models. Database Mining and Visualization This research will explore the architecture, algorithms, optimization and scalability issues to enable mining directly on data stored in (multiple) databases. We have already developed a layered architecture and have translated several mining algorithms (for association rules) to the relational context using IBM’s UDB/DB2 and ORALCE. We have successfully implemented K- way join, Query/Sub-query, and 2-GroupBy approaches into SQL92, and tested them for performance and scalability. This has enabled us to perform data mining directly on existing data without having to extract and/or convert them into a different format. However, we have only studied some of the optimization features of DB2 and ORACLE, and how to exploit them through our layered architecture. Even with this limited study, we have identified some limitations of the underlying optimizer (for example, self-joins are not currently optimized very well). We wish to continue this work on a broader scope, including the exploration of user- defined functions (UDFs), table functions, and stored procedures for mining. Specifically, this Information Technology Laboratory @ UT Arlington Page 3 of 5 Sept 7, 2001
    • Sharma Chakrvarthy project will investigate and develop techniques to move forward from file mining to database mining in the context of: i) Association rules using SQL92 and SQL-OR, ii) SUBDUE knowledge discovery system – which mines graphs. The focus of this project will be to develop new database techniques and extend/leverage current database techniques to support database mining in an efficient, scalable, and flexible manner, and iii) Visualization of the results of data mining. Motivation Although most of the applications today use relational databases, it is not currently easy (or possible) to directly mine data stored in databases. The data has to be siphoned out into files to effectively mine this data. Coupling the data mining tools with a growing base of accessible enterprise data -- often in the form of a data warehouse -- provides a capability with immense implications. The majority of the warehouse stores -- systems used for storing warehouse data -- are relational databases or their variants. There is a critical need for combining OLAP (On Line Analytical Processing) and mining over data stored in a data warehouse in a seamless manner. As a first step towards this goal, mining has to be supported over a database in an efficient manner. Most of the research on data mining is aimed at defining new mining problems and a majority of the algorithms for them were developed for data stored in file systems. Each had its own specialized data structures and buffer management strategies. In cases where the data were stored in a DBMS, data access was provided through an ODBC or SQL cursor interface making database mining very inefficient as compared to file mining. We are more interested in applying current mining algorithms efficiently and in a scalable manner on databases rather than developing new algorithms. Data Warehousing In this project, we investigate a comprehensive, cost-based framework for evaluating data propagation policies against data warehouse requirements and source database characteristics. We formalize data warehouse specification along the dimensions of staleness (or freshness), response time, storage, and computation cost, and classify source databases according to their data propagation capabilities. A detailed cost-model has been developed for a representative set of policies. A prototype implementation has allowed an exploration of the various trade-offs, and a test-bed implementation has provided evidence of its validity. Another aspect of the data warehouse problem that is being addressed in this project is the automated (or semi-automated) generation of mapping between source and data warehouse schemas. Motivation The research community is addressing a number of issues in response to increased reliance of organizations on data warehousing. Most work addresses individual aspects related to incremental view maintenance, propagation algorithms, consistency requirements, performance of OLAP queries etc. There remains a need to consolidate relevant results into a cohesive framework for data warehouse maintenance. Although data propagation policies, source database characteristics, and user requirements have been addressed individually, their co-dependencies and relationships have not been explored. Information Technology Laboratory @ UT Arlington Page 4 of 5 Sept 7, 2001
    • Sharma Chakrvarthy WebVigil: Just-In-Time Information Propagation The objectives of this project are to investigate the specification, management, and propagation of changes (on structured documents) as requested by the user in a timely manner and meeting the quality of service requirements. Our approach allows the user to specify (through the browser or using some other mechanism) the kind of changes they are interested in, to (web) documents at different levels of granularity. They can also specify how they need to be notified when the requested information becomes available. Quality of Service (QoS) information such as timing constraints, aggregated vs. individual changes will also be part of the user specification. Based on the user requirements, the techniques developed in this project will determine how these changes are monitored, collected and propagated. User specifications will be translated into a set of rules (event-condition-action rules so that we can use active capability developed so far) that are used for monitoring changes and propagation of relevant information. As scalability of triggers is an important aspect of the solution to the problem addressed in this proposal, we will pay attention to it during our design and implementation. Motivation Until now, active capability has been mostly investigated in the context of databases and some work has been done for distributed event detection and rule execution. Research on active databases hastened the introduction of triggers in most of the commercial database management systems available today. At the same time, trigger capability supported in commercial systems falls short both at the functionality and the scalability aspects. Research has shown that active capability can be effectively used for a variety of applications including information filtering, workflow management, and self-monitoring using which a system can adapt to various kinds of changes. There is also a lot of research that deals with push/pull of information in a general sense. This body of work has looked at propagation of information using a variety of techniques primarily to reduce delays, latency, increase throughput etc. There is also some work in multiple query processing using commonalities in a web setting using XML. The approaches investigated earlier are not targeted towards selective information propagation based on recognizing changes to documents, web pages etc. In this project, we apply active capability effectively for selective information propagation in a large network-centric (distributed, heterogeneous) environment. Web is a good example of such an environment where the pull paradigm is still being used extensively. This project will investigate research issues in making the push paradigm applicable to large network-centric environments such as web and distributed heterogeneous information systems. We will extensively draw upon our previous research on Sentinel/Snoop and our experience in building prototypes for centralized, agent-based, and distributed active capability. The ability to selectively monitor and be informed of changes will augment the current strategy of pulling information periodically and checking for interesting changes. Information Technology Laboratory @ UT Arlington Page 5 of 5 Sept 7, 2001