Relevant Updated Data Retrieval Architectural Model for Continuous Text Extraction

Relevant Updated Data Retrieval
Architectural Model for Continous Text
Extraction

 Introduction
 Existing System
 Proposed System
 Algorithm
 Conclusion
 References

 Sliding Window
 Count-based Window
 Time-based Window
 Incremental Threshold
 MapReduce
 Map
 Reduce
 Unsupervised Duplicate Detection

 Tough to monitor data stream
 Time consuming – Only server to process
 Entire document set has to be scanned
 Duplicate documents may be retrieved
 Main Memory not sufficient to accommodate large number of
documents

 MapReduce technique
 Server – Master Node
 Worker (Slave) Nodes
 Number of Worker Nodes is query dependent
 Each Worker Node uses Incremental Threshold Algorithm
for computing k relevant documents

 Processing Continuous Text Queries over Document Streams
 Continual monitoring of a list of recent documents
 First attempt to address email and news monitoring applications
 Currently for Text Documents
 Future Work – extending to Hyperlink structure

 Kyriakos Mouraditis, Spiridon Bakiras, Dimitris Papadias, ― “Continuous
Monitoring of top-K queries sliding window”.
 B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, 2002, ― “Models and
Issues in Data Streaming System” PODS„02, 1-16.
 VNAnh and A. Moffat, 2002, ― “Impact Transformation: Effective Efficient Web
Retrieval”, Int„l ACM SIGIR conf. Research and Development in Information
Retrieval.

Relevant Updated Data Retrieval Architectural Model for Continuous Text Extraction

Relevant Updated Data Retrieval Architectural Model for Continuous Text Extraction

More Related Content

What's hot

Viewers also liked

Similar to Relevant Updated Data Retrieval Architectural Model for Continuous Text Extraction

More from Kausal Malladi

Recently uploaded

Relevant Updated Data Retrieval Architectural Model for Continuous Text Extraction