Naveen Ashish Amit P. Sheth Department of Computer Science and Large Scale Distributed Information Systems Lab University of Georgia, Athens Information Mediation: Integrating Information from Multiple Information Sources
Why not simply materialize all the data in all the Web sources being integrated and have a really fast mediator ??
Will not scale, amount of space needed may be too much
Web sources can get updated
Cost of keeping data consistent can get prohibitive
We are building a mediator, not a data warehouse !
Approach then is to selectively materialize data
How do we automatically identify the portion of data most useful to materialize ?
Selecting Data to Materialize Distribution of User Queries (Identify frequently accessed classes) Structure of Sources (Prefetch data to speed up expensive queries) Updates (Have to consider maintenance cost) Classes of Data to Materialize SELECTING CLASSES