Embed presentation


















The document proposes an architectural model for continuously retrieving relevant updated data from text streams using MapReduce techniques. It aims to address issues with existing systems that find it tough to monitor data streams, are time consuming with only servers able to process, require scanning entire document sets, and may retrieve duplicate documents. The proposed system uses multiple worker nodes running an incremental threshold algorithm to compute the k most relevant documents for a query in a distributed manner.

















