5. Tough to monitor data stream
Time consuming – Only server to process
Entire document set has to be scanned
Duplicate documents may be retrieved
Main Memory not sufficient to accommodate large number of
documents
7. MapReduce technique
Server – Master Node
Worker (Slave) Nodes
Number of Worker Nodes is query dependent
Each Worker Node uses Incremental Threshold Algorithm
for computing k relevant documents
15. Processing Continuous Text Queries over Document Streams
Continual monitoring of a list of recent documents
First attempt to address email and news monitoring applications
Currently for Text Documents
Future Work – extending to Hyperlink structure
17. Kyriakos Mouraditis, Spiridon Bakiras, Dimitris Papadias, ― “Continuous
Monitoring of top-K queries sliding window”.
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, 2002, ― “Models and
Issues in Data Streaming System” PODS„02, 1-16.
VNAnh and A. Moffat, 2002, ― “Impact Transformation: Effective Efficient Web
Retrieval”, Int„l ACM SIGIR conf. Research and Development in Information
Retrieval.