Relevant Updated Data Retrieval
Architectural Model for Continous Text
Extraction
 Introduction
 Existing System
 Proposed System
 Algorithm
 Conclusion
 References
 Sliding Window
 Count-based Window
 Time-based Window
 Incremental Threshold
 MapReduce
 Map
 Reduce
 Unsupervised Duplicate Detection
 Introduction
 Existing System
 Proposed System
 Algorithm
 Conclusion
 References
 Tough to monitor data stream
 Time consuming – Only server to process
 Entire document set has to be scanned
 Duplicate documents may be retrieved
 Main Memory not sufficient to accommodate large number of
documents
 Introduction
 Existing System
 Proposed System
 Algorithm
 Conclusion
 References
 MapReduce technique
 Server – Master Node
 Worker (Slave) Nodes
 Number of Worker Nodes is query dependent
 Each Worker Node uses Incremental Threshold Algorithm
for computing k relevant documents
 Introduction
 Existing System
 Proposed System
 Algorithm
 Conclusion
 References
 Introduction
 Existing System
 Proposed System
 Algorithm
 Conclusion
 References
 Processing Continuous Text Queries over Document Streams
 Continual monitoring of a list of recent documents
 First attempt to address email and news monitoring applications
 Currently for Text Documents
 Future Work – extending to Hyperlink structure
 Introduction
 Existing System
 Proposed System
 Algorithm
 Conclusion
 References
 Kyriakos Mouraditis, Spiridon Bakiras, Dimitris Papadias, ― “Continuous
Monitoring of top-K queries sliding window”.
 B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, 2002, ― “Models and
Issues in Data Streaming System” PODS„02, 1-16.
 VNAnh and A. Moffat, 2002, ― “Impact Transformation: Effective Efficient Web
Retrieval”, Int„l ACM SIGIR conf. Research and Development in Information
Retrieval.
Relevant Updated Data Retrieval Architectural Model for Continuous Text Extraction

Relevant Updated Data Retrieval Architectural Model for Continuous Text Extraction

  • 1.
    Relevant Updated DataRetrieval Architectural Model for Continous Text Extraction
  • 2.
     Introduction  ExistingSystem  Proposed System  Algorithm  Conclusion  References
  • 3.
     Sliding Window Count-based Window  Time-based Window  Incremental Threshold  MapReduce  Map  Reduce  Unsupervised Duplicate Detection
  • 4.
     Introduction  ExistingSystem  Proposed System  Algorithm  Conclusion  References
  • 5.
     Tough tomonitor data stream  Time consuming – Only server to process  Entire document set has to be scanned  Duplicate documents may be retrieved  Main Memory not sufficient to accommodate large number of documents
  • 6.
     Introduction  ExistingSystem  Proposed System  Algorithm  Conclusion  References
  • 7.
     MapReduce technique Server – Master Node  Worker (Slave) Nodes  Number of Worker Nodes is query dependent  Each Worker Node uses Incremental Threshold Algorithm for computing k relevant documents
  • 10.
     Introduction  ExistingSystem  Proposed System  Algorithm  Conclusion  References
  • 14.
     Introduction  ExistingSystem  Proposed System  Algorithm  Conclusion  References
  • 15.
     Processing ContinuousText Queries over Document Streams  Continual monitoring of a list of recent documents  First attempt to address email and news monitoring applications  Currently for Text Documents  Future Work – extending to Hyperlink structure
  • 16.
     Introduction  ExistingSystem  Proposed System  Algorithm  Conclusion  References
  • 17.
     Kyriakos Mouraditis,Spiridon Bakiras, Dimitris Papadias, ― “Continuous Monitoring of top-K queries sliding window”.  B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, 2002, ― “Models and Issues in Data Streaming System” PODS„02, 1-16.  VNAnh and A. Moffat, 2002, ― “Impact Transformation: Effective Efficient Web Retrieval”, Int„l ACM SIGIR conf. Research and Development in Information Retrieval.