This document discusses techniques for recognizing malware files. It describes how each file is represented as a feature vector that captures static and dynamic attributes. A distance function is used to measure the similarity between vectors and identify nearest neighbors for classification. Files are classified using an instance-based classifier and optimized with techniques like a VP-tree and distance-bounded search. Classifications are deployed in a system that collects file fingerprints from users and shares threats and updates between components. A rule generator also aims to detect malware variants by learning rules based on conditions in files.