Near duplicate detection method based on random projection.
This presentation gives an overview of existing category of NDD methods and introduces WSH (Weighted SimHash). It also presents some result comparing original Simhash with WSH and cosine similarity based method
Near duplicate detection method based on random projection.
This presentation gives an overview of existing category of NDD methods and introduces WSH (Weighted SimHash). It also presents some result comparing original Simhash with WSH and cosine similarity based method