Handouts - Information Technology Laboratory at University of ...Document Transcript
The Information Technology Laboratory or ITLab (part of the CSE
Department at UTA) is actively involved in research and development on all
aspects of information technology: modeling, retrieval, management, optimization,
scalability and performance evaluation.
The group consists of Profs. Sharma Chakravarthy, Ramez Elmasri, and Leonidas
Fegaras whose complementary expertise allows them to work independently as
well as jointly on a broad set of topics ranging from -- modeling to systems
development to performance aspects. Some of the topics currently being worked
on are: Data Warehousing/information integration, Data Mining/Knowledge
Discovery, Web Databases, E-commerce, Active/push technology for large
network-centric Information Management, and Object-Oriented, Temporal &
The vision of the group is to carry out both fundamental and practically applicable
research and development to enable the use of information technology for diverse
needs. The group views interaction and collaboration with industry as critical for
identifying fundamental problems in the areas of databases and information
technology usage and functionality. Our aim is to identify problems based on the
current and future needs and develop formal solutions as well as implement proof-
of-principle systems. This approach provides a viable migration path for new
techniques/solutions to be integrated into real-life systems/applications.
We are seeking industry collaborators and project partners to pursue solutions to
problems that are of interest to IT industry. We are also interested in transferring
technology currently available at ITLab to real-life applications.
If you would like to know more about the ITLab and collaboration with us, please
contact me at firstname.lastname@example.org.
Please look around and enjoy!
Sharma Chakravarthy, Professor
Information Technology laboratory and
Computer Science and Engineering Department
The University of Texas at Arlington
Information Technology Laboratory @ UT Arlington Page 1 of 5
Sept 7, 2001
Research and Development Synopsis:
Research and Development contributions of Sharma Chakravarthy has spanned over 2 decades
starting with his work in image processing and graphics at the Tata Institute of Fundamental
Research (TIFR), Computer Science Group, Bombay, India to his work in advanced databases
(active/real-time, distributed, and heterogeneous) at the University of Florida (UFL), Gainesville,
and currently at the University of Texas @ Arlington (UTA). A number of prototypes have been
developed over the years (Sentinel consisting of C++ and Java versions of local and global event
detectors,, ECA agents for Sybase, Oracle, and DB2, Layered mining optimizer and visualization
tools, to name a few). A chronological summary of my research activities over the past 2 decades
is outlined below:
• Currently, at UTA (University of Texas @ Arlington), my research activity is
concentrated in the areas of distributed active capability, data mining (both graph-based
data mining and database mining), data warehousing, and push technology for large
network-centric environments. In addition, ECA agents for various relational databases
are being designed and developed.
• I have established an IT (Information Technology) Laboratory to broaden the scope of
research activities to include web, data warehousing, and related topics
• At UFL (University of Florida at Gainesville), I, not only consolidated my work on active
databases (the Sentinel project, started as HiPAC at CCA), but also broadened the scope
of my research to include temporal, real-time, distributed databases, and other topics such
as expert system for neuro-oncology problems, and virtual “clean-room”.
• At CCA (Computer Corporation of America, Cambridge, MA)) which became XAIT
(Xerox Advanced Information Technology, Cambridge, MA), I worked on various
federally funded projects, such as PROBE (object-oriented DBMS with support for
abstract data types, spatial queries, and transitive closure computations), HiPAC (High
Performance Active Database, pre-curser to Sentinel), VS (Visualization of Software),
and VSDS (Verifiable and Secure Distributed database System).
• At UMD (University of Maryland, College Park), I worked on PRISM (Parallel Logic
Programming System), DAVID (Distributed Heterogeneous Database System), and
Semantic query optimization for my thesis.
• Before joining UMD, I worked at TIFR (Tata Institute of Fundamental research,
Bombay, India) in the areas of image processing, SIGN -- a graphics package that is
operating system-, language-, and environment -independent.
Recently, Diane Cook and I organized the NSF/IDM 2001 workshop at Fort Worth. I have
served as a member of program committee for a number of International Conferences and
Workshops. I have also served as Co-Program Chair for workshops and conferences. I review
papers for a number of journals in the general area of databases. Finally, I have given a number of
keynote addresses, and tutorials in active, real-time, federated, and distributed databases in
Europe, North America, and Asia.
Information Technology Laboratory @ UT Arlington Page 2 of 5
Sept 7, 2001
Currently, R & D in the area of active databases/technology is along the following directions:
1. Design and implementation of a subscription-based Global event detector in Java that has
persistence, recovery, dynamic insertion and deletion of ECA rules, visualization etc.
2. Design and implementation of ECA agents for various DBMSs (e.g., Sybase, Oracle, and
DB2). This provides “value-added” active capability without modifications to the
underlying systems using JDBC and an ECA agent.
3. Development of a monitor for multiple relational DBMSs along with file-based systems.
4. Stand-alone C++ Local and global event detectors that can be used for monitoring in the
C++ application environment.
5. A distributed alert server based on the subscribe/publish paradigm.
Our work on active databases started with the HiPAC project at CCA/XAIT and continued with
the Sentinel project at UFL and now as WebVigil and other extensions at UTA. Currently, we
have an expressive event specification language Snoop, a seamless design for incorporating
event-condition-action (ECA) rules into an object-oriented framework, and the implementation of
Sentinel -- an active Object-oriented DBMS using Open OODB from Texas Instruments, Dallas.
Sentinel is perhaps the first effort that has addressed system and implementation issues for
incorporating active capability into an object-oriented system. Complex event detection and
nested transactions (as a 2 level transaction management -- by Exodus for top level transactions
and by Sentinel for nested transactions without recovery) for executing rules concurrently have
been implemented and integrated into Open OODB. We have also investigated optimization of
ECA rules using incremental updates.
The use of active capability heavily depends on tools and methodology for formulating and
testing large ECA rules. For this both static analysis tools and run-time visualization tools are
critical. A high-level rule visualization/explanation tool has been developed for understanding
ECA rule interaction with transactions.
We have also investigated the use of active capability in a broader scope. We have shown how
active capability can be exploited at the system level to support advanced transaction models.
Database Mining and Visualization
This research will explore the architecture, algorithms, optimization and scalability issues to
enable mining directly on data stored in (multiple) databases. We have already developed a
layered architecture and have translated several mining algorithms (for association rules) to the
relational context using IBM’s UDB/DB2 and ORALCE. We have successfully implemented K-
way join, Query/Sub-query, and 2-GroupBy approaches into SQL92, and tested them for
performance and scalability. This has enabled us to perform data mining directly on existing data
without having to extract and/or convert them into a different format. However, we have only
studied some of the optimization features of DB2 and ORACLE, and how to exploit them
through our layered architecture. Even with this limited study, we have identified some
limitations of the underlying optimizer (for example, self-joins are not currently optimized very
well). We wish to continue this work on a broader scope, including the exploration of user-
defined functions (UDFs), table functions, and stored procedures for mining. Specifically, this
Information Technology Laboratory @ UT Arlington Page 3 of 5
Sept 7, 2001
project will investigate and develop techniques to move forward from file mining to database
mining in the context of: i) Association rules using SQL92 and SQL-OR, ii) SUBDUE
knowledge discovery system – which mines graphs. The focus of this project will be to develop
new database techniques and extend/leverage current database techniques to support database
mining in an efficient, scalable, and flexible manner, and iii) Visualization of the results of data
Although most of the applications today use relational databases, it is not currently easy (or
possible) to directly mine data stored in databases. The data has to be siphoned out into files to
effectively mine this data. Coupling the data mining tools with a growing base of accessible
enterprise data -- often in the form of a data warehouse -- provides a capability with immense
implications. The majority of the warehouse stores -- systems used for storing warehouse data --
are relational databases or their variants. There is a critical need for combining OLAP (On Line
Analytical Processing) and mining over data stored in a data warehouse in a seamless manner.
As a first step towards this goal, mining has to be supported over a database in an efficient
Most of the research on data mining is aimed at defining new mining problems and a majority of
the algorithms for them were developed for data stored in file systems. Each had its own
specialized data structures and buffer management strategies. In cases where the data were stored
in a DBMS, data access was provided through an ODBC or SQL cursor interface making
database mining very inefficient as compared to file mining. We are more interested in applying
current mining algorithms efficiently and in a scalable manner on databases rather than
developing new algorithms.
In this project, we investigate a comprehensive, cost-based framework for evaluating data
propagation policies against data warehouse requirements and source database characteristics. We
formalize data warehouse specification along the dimensions of staleness (or freshness), response
time, storage, and computation cost, and classify source databases according to their data
propagation capabilities. A detailed cost-model has been developed for a representative set of
policies. A prototype implementation has allowed an exploration of the various trade-offs, and a
test-bed implementation has provided evidence of its validity.
Another aspect of the data warehouse problem that is being addressed in this project is the
automated (or semi-automated) generation of mapping between source and data warehouse
The research community is addressing a number of issues in response to increased reliance of
organizations on data warehousing. Most work addresses individual aspects related to
incremental view maintenance, propagation algorithms, consistency requirements, performance
of OLAP queries etc. There remains a need to consolidate relevant results into a cohesive
framework for data warehouse maintenance. Although data propagation policies, source database
characteristics, and user requirements have been addressed individually, their co-dependencies
and relationships have not been explored.
Information Technology Laboratory @ UT Arlington Page 4 of 5
Sept 7, 2001
WebVigil: Just-In-Time Information Propagation
The objectives of this project are to investigate the specification, management, and
propagation of changes (on structured documents) as requested by the user in a timely manner
and meeting the quality of service requirements. Our approach allows the user to specify
(through the browser or using some other mechanism) the kind of changes they are interested
in, to (web) documents at different levels of granularity. They can also specify how they need
to be notified when the requested information becomes available. Quality of Service (QoS)
information such as timing constraints, aggregated vs. individual changes will also be part of
the user specification. Based on the user requirements, the techniques developed in this
project will determine how these changes are monitored, collected and propagated. User
specifications will be translated into a set of rules (event-condition-action rules so that we can
use active capability developed so far) that are used for monitoring changes and propagation
of relevant information. As scalability of triggers is an important aspect of the solution to the
problem addressed in this proposal, we will pay attention to it during our design and
Until now, active capability has been mostly investigated in the context of databases and some
work has been done for distributed event detection and rule execution. Research on active
databases hastened the introduction of triggers in most of the commercial database management
systems available today. At the same time, trigger capability supported in commercial systems
falls short both at the functionality and the scalability aspects. Research has shown that active
capability can be effectively used for a variety of applications including information filtering,
workflow management, and self-monitoring using which a system can adapt to various kinds of
changes. There is also a lot of research that deals with push/pull of information in a general sense.
This body of work has looked at propagation of information using a variety of techniques
primarily to reduce delays, latency, increase throughput etc. There is also some work in multiple
query processing using commonalities in a web setting using XML. The approaches investigated
earlier are not targeted towards selective information propagation based on recognizing changes
to documents, web pages etc.
In this project, we apply active capability effectively for selective information propagation in a
large network-centric (distributed, heterogeneous) environment. Web is a good example of
such an environment where the pull paradigm is still being used extensively. This project will
investigate research issues in making the push paradigm applicable to large network-centric
environments such as web and distributed heterogeneous information systems.
We will extensively draw upon our previous research on Sentinel/Snoop and our experience in
building prototypes for centralized, agent-based, and distributed active capability. The ability to
selectively monitor and be informed of changes will augment the current strategy of pulling
information periodically and checking for interesting changes.
Information Technology Laboratory @ UT Arlington Page 5 of 5
Sept 7, 2001