kantorNSF-NIJ-ISI-03-06-04.ppt

Libraries and Intelligence NSF/NIJ Symposium on Intelligence and Security Informatics. Tucson, AR. Paul Kantor June 2, 2003 Research supported in part by the National Science Foundation under Grant EIA-0087022and by the Advanced Research Development Activity under Contract 2002-H790400-000. The views expressed in this presentation are those of the author, and do not necessarily represent the views of the sponsoring agency.

Relation to General Intelligence and Security Informatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Relation to Librarianship ,[object Object],[object Object],[object Object],[object Object],[object Object]

Librarianship ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Librarianship ,[object Object],[object Object],[object Object],[object Object],[object Object]

Two Projects ,[object Object],[object Object]

[object Object],[object Object],[object Object],OBJECTIVE : Monitor streams of textualized communication to detect pattern changes and "significant" events

MMS Team Statisticians, computer scientists, experts in info. Retrieval & library science, etc ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Dr. Rafail Ostrovsky, Telcordia Technologies, -algorithms Prof. Endre Boros, --Boolean optimization. Dr. Vladimir Menkov programming; Dr. Alex Genkin programming; Mr. Andrei Anghelescu; graduate asisstant Mr. Dmitiry Fradkin; graduate assistant

[object Object],[object Object],[object Object],[object Object],TECHNICAL PROBLEM :

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

COMPONENTS OF AUTOMATIC MESSAGE PROCESSING (1). Compression of Text -- to meet storage and processing limitations; (2). Representation of Text -- put in form amenable to computation and statistical analysis; (3). Matching Scheme -- computing similarity between documents; (4). Learning Method -- build on judged examples to determine characteristics of document cluster (“event”) (5). Fusion Scheme -- combine methods (scores) to yield improved detection/clustering.

Random Projections Boolean Random Projections Robust Feature Selection Compression Representation Bag of Words Bag of Bits Matching Learning Fusion tf-idf kNN Boolean r-NN Rocchio separator Combinatorial Clustering Naïve Bayes Sparse Bayes Discriminant Analysis Support Vector Machines Non-linear Classifiers Project Components: Rutgers DIMACS MMS

[object Object],[object Object],[object Object],Proposed Advances

[object Object],[object Object],[object Object],[object Object],Proposed Advances II

[object Object],[object Object],[object Object],[object Object],MORE SOPHISTICATED STATISTICAL APPROACHES : ,[object Object]

[object Object],[object Object],[object Object],[object Object],THE APPROACH ,[object Object]

Mercer Kernels Mercer’s Theorem gives necessary and sufficient conditions for a continuous symmetric function K to admit this representation: “ Mercer Kernels” This kernel defines a set of functions H K , elements of which have an expansion as: This set of functions is a “reproducing kernel hilbert space” K “pos. semi-definite” Prepared by David L. Madigan

Support Vector Machine Two-class classifier with the form: parameters chosen to minimize: Many of the fitted  ’s are usually zero; x ’s corresponding the the non-zero  ’s are the “support vectors.” complexity penalty Gram matrix tuning constant Prepared by David L. Madigan

Regularized Linear Feature Space Model Choose a model of the form: to minimize: Solution is finite dimensional: just need to know K , not  ! prediction is sign(f(x)) A kernel is a function K , such that for all x , z  X where  is a mapping from X to an inner product feature space F Prepared by David L. Madigan

Mixture Models ,[object Object],[object Object],[object Object]

Example Results on Fusion ,[object Object],[object Object],[object Object]

Feature space Random Subspace Score space Learning takes place in two spaces: For matching and filtering, we learn rules in the primary space of document features. For fusion processes we learn rules in a secondary space of “pseudo-features” which are assigned by entire systems, to incoming documents. Relevant Relevant

REFERENCE ASPECT Effective Communication with the Analyst User

HITIQA: High-Quality Interactive Question Answering University at Albany, SUNY Rutgers University

HITIQA Team ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

HITIQA Concept Question: What recent disasters occurred in tunnels used for transportation? Possible Category Axes Seen Vehicle type Losses/Cost location other auto train USER PROFILE; TASK CONTEXT QUESTION NL PROCESSING Clarification Dialogue: S: Are you interested in train accidents, automobile accidents or others? U: Any that involved lost life or a major disruption in communication. Must identify loses. ,[object Object],[object Object],[object Object],[object Object],[object Object],SEMANTIC PROC FUSE & SUMMARIZE Answer & Justification ANSWER GENER. SEARCH & CATEGORIZE KB TEMPLATE SELECTION Focused Information Need QUALITY ASSESSMENT

Key Research Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Document Retrieval Build Frames Process Frames Dialogue Manager Question Processor Wordnet Completed Work question Segment/ Filter Cluster Segments Query Refinement Current Focus DB Gate Answer Generator answer Visualization

Data-Driven NL Semantics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],User Semantics System Semantics

Answer Space Topology KERNEL QUESTION MATCH NEAR MISSES, ALTERNATIVE INTERPRETATIONS ALL RETRIEVED FRAMES

Quality Judgments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Factor Analysis of 9 Quality Features Appearance Content

Modeling Quality of Text ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Performance of models Quality Prediction by Linear Combination of Textual Features (from 5 to 17 variables). Split Half for Training and Testing. Quality Factors Prediction Rate Depth 67% Author Credential 55% Accuracy 69% Source 57% Objectivity 64% Grammar 79% One Side vs Multi View 70% Verbosity 63% Readability 76%

In Summary ,[object Object],[object Object]

Two Roles for Learning ,[object Object],[object Object]

Appendix: The following slides were not presented at the conference.

Communicating Credibility ,[object Object],[object Object],[object Object]

Data Fusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Background on Fusion Problem ,[object Object],[object Object],[object Object],[object Object],[object Object]

Non-linear “iso-relevance”

Local Fusion Rule ,[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Local Fusion Results are Good

Summary of Local Fusion PROBLEM CASE We ran 5 split half runs on the odd case (318) and the results persist.

Is Local Sensible? ,[object Object],[object Object],[object Object]

Our Approach to Retrieval Fusion SMART InQuery FUSION PROCESS Request DOCUMENTS SETS Result Set Delivered SET Result Set ADOPT: Fusion System Monitor Fusion Set and Receive Feedback USE: Better System Adaptive “Local” Fusion

kantorNSF-NIJ-ISI-03-06-04.ppt

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to kantorNSF-NIJ-ISI-03-06-04.ppt

Similar to kantorNSF-NIJ-ISI-03-06-04.ppt (20)

More from butest

More from butest (20)

kantorNSF-NIJ-ISI-03-06-04.ppt

Editor's Notes