SlideShare a Scribd company logo
1 of 50
Libraries and Intelligence NSF/NIJ Symposium on Intelligence and Security Informatics. Tucson, AR. Paul Kantor June 2, 2003  Research supported in part by the National Science Foundation under Grant EIA-0087022and by the Advanced Research Development Activity under Contract 2002-H790400-000. The views expressed in this presentation are those of the author, and do not necessarily represent the views of the sponsoring agency.
Relation to General Intelligence and Security Informatics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relation to Librarianship ,[object Object],[object Object],[object Object],[object Object],[object Object]
Librarianship ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Librarianship ,[object Object],[object Object],[object Object],[object Object],[object Object]
Two Projects ,[object Object],[object Object]
[object Object],[object Object],[object Object],OBJECTIVE : Monitor streams of textualized communication to detect pattern changes and "significant" events
 
MMS Team Statisticians, computer scientists, experts in info. Retrieval & library science, etc ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Dr. Rafail Ostrovsky, Telcordia Technologies, -algorithms  Prof. Endre Boros, --Boolean optimization.  Dr. Vladimir Menkov programming;  Dr. Alex Genkin programming;  Mr. Andrei Anghelescu; graduate asisstant Mr. Dmitiry Fradkin; graduate assistant
[object Object],[object Object],[object Object],[object Object],TECHNICAL PROBLEM :
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
COMPONENTS OF AUTOMATIC MESSAGE PROCESSING (1).  Compression of Text  -- to meet storage and processing limitations; (2).  Representation of Text  -- put in form amenable to computation and statistical analysis; (3).  Matching Scheme  -- computing similarity between documents; (4).  Learning Method  -- build on judged examples to determine characteristics of document cluster (“event”) (5).  Fusion Scheme  -- combine methods (scores) to yield improved detection/clustering.
Random Projections  Boolean Random Projections  Robust Feature Selection  Compression Representation Bag of Words Bag of Bits Matching Learning Fusion tf-idf kNN Boolean r-NN Rocchio separator Combinatorial Clustering  Naïve Bayes Sparse Bayes Discriminant Analysis Support Vector Machines Non-linear Classifiers Project Components: Rutgers DIMACS MMS
[object Object],[object Object],[object Object],Proposed Advances
[object Object],[object Object],[object Object],[object Object],Proposed Advances II
[object Object],[object Object],[object Object],[object Object],MORE SOPHISTICATED STATISTICAL APPROACHES : ,[object Object]
[object Object],[object Object],[object Object],[object Object],THE APPROACH ,[object Object]
Mercer Kernels Mercer’s Theorem gives necessary and sufficient conditions for a continuous symmetric function  K  to admit this representation: “ Mercer Kernels” This kernel defines a set of functions  H K , elements of which have an expansion as: This set of functions is a “reproducing kernel hilbert space” K  “pos. semi-definite” Prepared by David L. Madigan
Support Vector Machine Two-class classifier with the form: parameters chosen to minimize: Many of the fitted   ’s are usually zero;  x ’s corresponding the the non-zero   ’s are the “support vectors.” complexity penalty Gram matrix tuning constant Prepared by David L. Madigan
Regularized Linear Feature Space Model Choose a model of the form: to minimize: Solution is finite dimensional: just need to know  K , not    ! prediction is sign(f(x)) A  kernel  is a function  K , such that for all  x , z    X where    is a mapping from  X  to an inner product feature space  F Prepared by David L. Madigan
Mixture Models ,[object Object],[object Object],[object Object]
Example Results on Fusion ,[object Object],[object Object],[object Object]
Feature space Random Subspace Score space Learning takes place in two spaces: For matching and filtering, we learn rules in the primary space of document features. For fusion processes we learn rules in a secondary space of “pseudo-features” which are assigned by entire systems, to incoming documents. Relevant Relevant
REFERENCE ASPECT Effective Communication with the Analyst User
HITIQA:    High-Quality Interactive    Question Answering University at Albany, SUNY Rutgers University
HITIQA Team ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
HITIQA Concept Question:  What recent disasters occurred  in tunnels used for transportation?   Possible Category Axes Seen Vehicle type Losses/Cost location other auto train USER PROFILE; TASK CONTEXT QUESTION NL PROCESSING Clarification Dialogue: S:  Are you interested in train accidents, automobile accidents or others? U:  Any that involved lost life or a major disruption in communication. Must identify loses. ,[object Object],[object Object],[object Object],[object Object],[object Object],SEMANTIC PROC FUSE & SUMMARIZE Answer & Justification ANSWER GENER. SEARCH & CATEGORIZE KB TEMPLATE SELECTION Focused Information Need QUALITY ASSESSMENT
Key Research Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Document  Retrieval Build Frames Process Frames Dialogue Manager Question Processor Wordnet Completed Work   question Segment/ Filter Cluster Segments Query  Refinement Current Focus   DB Gate Answer Generator answer Visualization
Data-Driven NL Semantics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],User Semantics System Semantics
Answer Space Topology KERNEL QUESTION MATCH NEAR MISSES,  ALTERNATIVE  INTERPRETATIONS ALL RETRIEVED FRAMES
Quality Judgments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Factor Analysis of 9 Quality Features Appearance Content
Modeling Quality of Text ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance of models Quality Prediction by Linear Combination of Textual Features (from 5 to 17 variables). Split Half for Training and Testing. Quality Factors Prediction  Rate Depth 67% Author Credential 55% Accuracy 69% Source 57% Objectivity 64% Grammar 79% One Side vs Multi View 70% Verbosity 63% Readability 76%
 
In Summary ,[object Object],[object Object]
Two Roles for Learning ,[object Object],[object Object]
Appendix: The following slides were not presented at the conference.
Communicating Credibility ,[object Object],[object Object],[object Object]
Data Fusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Background on Fusion Problem ,[object Object],[object Object],[object Object],[object Object],[object Object]
Non-linear “iso-relevance”
Local Fusion Rule ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Local Fusion Results are Good
Summary of Local Fusion PROBLEM CASE  We ran 5 split half runs on the odd case (318) and the results persist.
Is Local Sensible? ,[object Object],[object Object],[object Object]
 
 
Our Approach to Retrieval Fusion SMART InQuery FUSION  PROCESS Request DOCUMENTS  SETS Result Set Delivered SET Result Set ADOPT: Fusion System Monitor Fusion Set and Receive  Feedback USE: Better System Adaptive “Local” Fusion

More Related Content

What's hot

Ethnograph 10 Jul07
Ethnograph 10 Jul07Ethnograph 10 Jul07
Ethnograph 10 Jul07Clara Kwan
 
Ethnograph 11 Jul07
Ethnograph 11 Jul07Ethnograph 11 Jul07
Ethnograph 11 Jul07Clara Kwan
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Predictionvivatechijri
 
Project Presentation
Project PresentationProject Presentation
Project Presentationbutest
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval forWaheeb Ahmed
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.docbutest
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNNSomnath Banerjee
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...IJCSIS Research Publications
 
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...Robert Oostenveld
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET Journal
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Toolsidescitation
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...IJDKP
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalMounia Lalmas-Roelleke
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web: butest
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25Francesco Osborne
 

What's hot (20)

Ethnograph 10 Jul07
Ethnograph 10 Jul07Ethnograph 10 Jul07
Ethnograph 10 Jul07
 
Text categorization
Text categorizationText categorization
Text categorization
 
Ethnograph 11 Jul07
Ethnograph 11 Jul07Ethnograph 11 Jul07
Ethnograph 11 Jul07
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Prediction
 
Project Presentation
Project PresentationProject Presentation
Project Presentation
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
 
taghelper-final.doc
taghelper-final.doctaghelper-final.doc
taghelper-final.doc
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNN
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
 
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
On the large scale of studying dynamics with MEG: Lessons learned from the Hu...
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
IRJET-Classifying Mined Online Discussion Data for Reflective Thinking based ...
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Tools
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
 
Machine learning for the Web:
Machine learning for the Web: Machine learning for the Web:
Machine learning for the Web:
 
Linked science presentation 25
Linked science presentation 25Linked science presentation 25
Linked science presentation 25
 

Viewers also liked

SemiBoost: Boosting for Semi-supervised Learning
SemiBoost: Boosting for Semi-supervised LearningSemiBoost: Boosting for Semi-supervised Learning
SemiBoost: Boosting for Semi-supervised Learningbutest
 
Artem Volftrub анатомия интернет банка
Artem Volftrub анатомия интернет банкаArtem Volftrub анатомия интернет банка
Artem Volftrub анатомия интернет банкаguest092df8
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Preview Class Handout "
Preview Class Handout "Preview Class Handout "
Preview Class Handout "butest
 
Best Weight Loss Plan For 2010
Best Weight Loss Plan For 2010Best Weight Loss Plan For 2010
Best Weight Loss Plan For 2010Brians Garage Sale
 
osm.cs.byu.edu
osm.cs.byu.eduosm.cs.byu.edu
osm.cs.byu.edubutest
 

Viewers also liked (7)

SemiBoost: Boosting for Semi-supervised Learning
SemiBoost: Boosting for Semi-supervised LearningSemiBoost: Boosting for Semi-supervised Learning
SemiBoost: Boosting for Semi-supervised Learning
 
Artem Volftrub анатомия интернет банка
Artem Volftrub анатомия интернет банкаArtem Volftrub анатомия интернет банка
Artem Volftrub анатомия интернет банка
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Preview Class Handout "
Preview Class Handout "Preview Class Handout "
Preview Class Handout "
 
Best Weight Loss Plan For 2010
Best Weight Loss Plan For 2010Best Weight Loss Plan For 2010
Best Weight Loss Plan For 2010
 
osm.cs.byu.edu
osm.cs.byu.eduosm.cs.byu.edu
osm.cs.byu.edu
 
New Media Language
New Media LanguageNew Media Language
New Media Language
 

Similar to kantorNSF-NIJ-ISI-03-06-04.ppt

eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Intobutest
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...BaoTramDuong2
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval Systemvimalsura
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approchanil maurya
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Templatebutest
 
Web Information Extraction for the DB Research Domain
Web Information Extraction for the DB Research DomainWeb Information Extraction for the DB Research Domain
Web Information Extraction for the DB Research Domainliat_kakun
 
Writing a scientific manuscript
Writing a scientific manuscriptWriting a scientific manuscript
Writing a scientific manuscriptMartin McMorrow
 
Disambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersDisambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersBaden Hughes
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainMichael Genkin
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 

Similar to kantorNSF-NIJ-ISI-03-06-04.ppt (20)

eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Parallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval SystemParallel and Distributed Information Retrieval System
Parallel and Distributed Information Retrieval System
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
Lexicon base approch
Lexicon base approchLexicon base approch
Lexicon base approch
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Introduction
IntroductionIntroduction
Introduction
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Web Information Extraction for the DB Research Domain
Web Information Extraction for the DB Research DomainWeb Information Extraction for the DB Research Domain
Web Information Extraction for the DB Research Domain
 
Writing a scientific manuscript
Writing a scientific manuscriptWriting a scientific manuscript
Writing a scientific manuscript
 
Disambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities ResearchersDisambiguating Advanced Computing for Humanities Researchers
Disambiguating Advanced Computing for Humanities Researchers
 
Web Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research DomainWeb Information Extraction for the Database Research Domain
Web Information Extraction for the Database Research Domain
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

kantorNSF-NIJ-ISI-03-06-04.ppt

  • 1. Libraries and Intelligence NSF/NIJ Symposium on Intelligence and Security Informatics. Tucson, AR. Paul Kantor June 2, 2003 Research supported in part by the National Science Foundation under Grant EIA-0087022and by the Advanced Research Development Activity under Contract 2002-H790400-000. The views expressed in this presentation are those of the author, and do not necessarily represent the views of the sponsoring agency.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.  
  • 9.
  • 10.
  • 11.
  • 12. COMPONENTS OF AUTOMATIC MESSAGE PROCESSING (1). Compression of Text -- to meet storage and processing limitations; (2). Representation of Text -- put in form amenable to computation and statistical analysis; (3). Matching Scheme -- computing similarity between documents; (4). Learning Method -- build on judged examples to determine characteristics of document cluster (“event”) (5). Fusion Scheme -- combine methods (scores) to yield improved detection/clustering.
  • 13. Random Projections Boolean Random Projections Robust Feature Selection Compression Representation Bag of Words Bag of Bits Matching Learning Fusion tf-idf kNN Boolean r-NN Rocchio separator Combinatorial Clustering Naïve Bayes Sparse Bayes Discriminant Analysis Support Vector Machines Non-linear Classifiers Project Components: Rutgers DIMACS MMS
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Mercer Kernels Mercer’s Theorem gives necessary and sufficient conditions for a continuous symmetric function K to admit this representation: “ Mercer Kernels” This kernel defines a set of functions H K , elements of which have an expansion as: This set of functions is a “reproducing kernel hilbert space” K “pos. semi-definite” Prepared by David L. Madigan
  • 19. Support Vector Machine Two-class classifier with the form: parameters chosen to minimize: Many of the fitted  ’s are usually zero; x ’s corresponding the the non-zero  ’s are the “support vectors.” complexity penalty Gram matrix tuning constant Prepared by David L. Madigan
  • 20. Regularized Linear Feature Space Model Choose a model of the form: to minimize: Solution is finite dimensional: just need to know K , not  ! prediction is sign(f(x)) A kernel is a function K , such that for all x , z  X where  is a mapping from X to an inner product feature space F Prepared by David L. Madigan
  • 21.
  • 22.
  • 23. Feature space Random Subspace Score space Learning takes place in two spaces: For matching and filtering, we learn rules in the primary space of document features. For fusion processes we learn rules in a secondary space of “pseudo-features” which are assigned by entire systems, to incoming documents. Relevant Relevant
  • 24. REFERENCE ASPECT Effective Communication with the Analyst User
  • 25. HITIQA: High-Quality Interactive Question Answering University at Albany, SUNY Rutgers University
  • 26.
  • 27.
  • 28.
  • 29. Document Retrieval Build Frames Process Frames Dialogue Manager Question Processor Wordnet Completed Work question Segment/ Filter Cluster Segments Query Refinement Current Focus DB Gate Answer Generator answer Visualization
  • 30.
  • 31. Answer Space Topology KERNEL QUESTION MATCH NEAR MISSES, ALTERNATIVE INTERPRETATIONS ALL RETRIEVED FRAMES
  • 32.
  • 33. Factor Analysis of 9 Quality Features Appearance Content
  • 34.
  • 35. Performance of models Quality Prediction by Linear Combination of Textual Features (from 5 to 17 variables). Split Half for Training and Testing. Quality Factors Prediction Rate Depth 67% Author Credential 55% Accuracy 69% Source 57% Objectivity 64% Grammar 79% One Side vs Multi View 70% Verbosity 63% Readability 76%
  • 36.  
  • 37.
  • 38.
  • 39. Appendix: The following slides were not presented at the conference.
  • 40.
  • 41.
  • 42.
  • 44.
  • 45.
  • 46. Summary of Local Fusion PROBLEM CASE We ran 5 split half runs on the odd case (318) and the results persist.
  • 47.
  • 48.  
  • 49.  
  • 50. Our Approach to Retrieval Fusion SMART InQuery FUSION PROCESS Request DOCUMENTS SETS Result Set Delivered SET Result Set ADOPT: Fusion System Monitor Fusion Set and Receive Feedback USE: Better System Adaptive “Local” Fusion

Editor's Notes

  1. Librarians have long been concerned to organize the materials that have been selected as worthy of inclusion in a library. Thus it has been a substantial cultural change during the past 10 years, as librarians realized that they have inherited responsibility for organizing the exploding cultural resource represented by the World Wide Web. This was made possible by techniques which had been developed 30 and 40 years, earlier on a theoretical basis, for the indexing and retrieval of arbitrary texts. Since an enormous amount of communication now takes place in electronic form, it has become possible to ask whether expansion of these techniques for organization and retrieval can facilitate the scanning of streams of communication, in order to detect (either after the fact or in advance) communications among those intent on doing harm.   Since the attacks of September 11, 2001 by Al Qeada on the mainland of the United States, this agenda has been moved forward with remarkable speed.  We review a number of projects underway at Rutgers University which bear on both the technical aspects and the interactive or "user oriented" aspects of this problem. Research to be described in this talk is supported in part by the National Science Foundation and by the Advanced Research Development Activity of the Intelligence Community.