1. Practical Issues for Automated Categorization of Web Sites John M. Pierre [email_address] Metacode Technologies, Inc. 139 Townsend Street San Francisco, CA 94107 (Collaborators: B. Wohler, R. Daniel, M. Butler, R. Avedon)
7. Experimental Setup: Targeted Spidering ‘ Query’ Pages Metatags? Send Query Use <body> live? Frames? <a href=? Try www. HTTP Get Domain name Yes No Yes No Yes prod, service, about, info, press, news No
8.
9. Experimental Setup: System Architecture The Web Domain Names IR Engine Decision SEC-NAICS Web pages Foo.com 11, 21, 23 Text Query Matching documents Spider
10. Results P=Precision = # correctly assigned / # assigned R=Recall = # correctly assigned / # total correct F1 = 2 P R / (P+R) micro-averaged = computer over all categories macro-averaged = per category, then averaged