International Technology Alliance                      in       Network & Information Sciences    Information Extraction a...
Background Presentation based on recent papers and ARL D2D  P. Xue, D.Mott, D. Braines, S. Poteet, A. Kao, C. Giammanco, ...
OUTLINE Introduction  • Data-to-Decisions (D2D)  • US UK International Technology Alliance (ITA) D2D Enablers  • End-to-...
Changing Landscape      Not so long ago …                                But now …Missions supported by intelligence      ...
Our Vocabulary Has To Change …  Not so long ago, our       But now we must broadenvocabulary centered on…           our th...
Data-to-Decisions (D2D)  We need to be able to synthesize all available data into  information suitable for the “decision-...
D2D Process                                                  Decide          Question or          Hypothesize             ...
International Technology Alliance (ITA)     Joint US-UK research alliance with industrial, academic, government members  ...
OUTLINE Background  • Data-to-Decisions (D2D)  • US UK International Technology Alliance (ITA) D2D Enablers  • End-to-en...
ITA Framework: Sensor Fabric                                                                             Middleware infra...
Persistent Wide Area Surveillance (PWAS)     US Acoustic sensorsInteroperating & Sharing data  with UK PWAS EO sensors (Pe...
ITA Framework: Distributed Dynamic                                 Federated Database (DDFD)            Distributed Dynami...
Policy Controlled DDFD for NATO    Intelligence Fusion Center
ITA Policy Management Toolkit                                                                    Policy-based Management ...
ITA Sensor Assignment to Missions (SAM)  SAM is a software tool for agile sensor-task assignment. SAM employs an  extensib...
OUTLINE Background  • Data-to-Decisions (D2D)  • US UK International Technology Alliance (ITA) D2D Enablers  • End-to-en...
Controlled Natural Language &                                    Controlled English Controlled English   • Controlled Eng...
ITA CE Architecture                                            Concepts are defined usually as specializations of CE sent...
Human-Machine Interaction                       via CNL/CE                                          Unambiguous & machine...
Structured & Unstructured                           Information Processing                                               ...
Integrating Hard/Soft Information Sources Use CNL/CE based approach for D2D activities  (e.g., fusion and asset allocatio...
Ex. Automatic Fusion                                                               Patrol on North rd issues a semi-struct...
Ex. Semi-automatic Fusion                                                                                                 ...
UK Pathfinder Transitions                                                     PF I: IBM, Southampton, Logica              ...
Other Potential Usage                       of CNL/CE Explore the use of CNL/CE as a flexible and extensible  framework f...
Terra Harvest (TH)TH Architecture                       TH is a D2D framework that enables ISR interoperability,          ...
Contact Information             Tien Pham, Ph.D.     Networked Sensing & Fusion Branch     Signal & Image Processing Divis...
Upcoming SlideShare
Loading in …5
×

Information Extraction and Integration of Hard and Soft Information for D2D via Controlled National Language

656 views
588 views

Published on

"Information Extraction and Integration of Hard and Soft Information for D2D via
Controlled National Language,” Dr. Tien Pham, US Army Research Laboratory

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
656
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Information Extraction and Integration of Hard and Soft Information for D2D via Controlled National Language

  1. 1. International Technology Alliance in Network & Information Sciences Information Extraction and Integration of Hard & SoftInformation for D2D via Controlled National Language 3rd Socio-Cultural Data Summit National Defense University November 27-28, 2012 Tien Pham, Ph.D. U.S. Army Research Laboratory tien.pham1.civ@mail.mil
  2. 2. Background Presentation based on recent papers and ARL D2D P. Xue, D.Mott, D. Braines, S. Poteet, A. Kao, C. Giammanco, T. Pham, “Information Extraction using Controlled English to support Knowledge-Sharing and Decision-Making,” ICCRTS 2012, Fairfax, VA, June 2012. A. Preece,D. Pizzocaro, D. Barines, D. Mott, D., G. de Mel, T. Pham, “Integrating Hard and Soft Information Sources for D2D Using Controlled Natural Language,” FUSION 2012, Singapore, July 2012. Acknowledgement • International Technology Alliance (ITA) Collaborators - Ping Xue, Ann Kao & Steve Poteet, Boeing - Prof. Alun Preece & Diego Pizzocaro, Cardiff University - Dr. David Mott & Dave Braines, IBM UK - Geeth de Mel, IBM US (University of Aberdeen) • Army Research Laboratory (ARL) Collaborators - Mike Kolodny, ARL Sensors & Electron Devices Directorate (SEDD) - Dr. Cheryl Giammanco, ARL Human Research & Engineering Directorate (HRED)
  3. 3. OUTLINE Introduction • Data-to-Decisions (D2D) • US UK International Technology Alliance (ITA) D2D Enablers • End-to-end frameworks • Technology enablers Controlled Natural Language (CNL) • Controlled English (CE) • Information Extraction and Integration via CE
  4. 4. Changing Landscape Not so long ago … But now …Missions supported by intelligence Missions driven by intelligenceCounter-insurgency: secondary mission Counter-insurgency: primary missionAnalysis in days Analysis in minutes & secondsStrategic focus on information Tactical focus on informationIntelligence analysis at division level Analysis at company level or belowIntelligence consumed by commanders Intelligence consumed by every SoldierEnemy operates at same optempo Enemy at a faster optempoUnilateral action Coalition warfare
  5. 5. Our Vocabulary Has To Change … Not so long ago, our But now we must broadenvocabulary centered on… our thinking … Targeting Situational Awareness Data Meta-data, Information Sensor Fusion All source Information Fusion Sensor Data source Friendlies Partners Mean Standard Deviation Real Time Forensics Getting More Data Effective decisions
  6. 6. Data-to-Decisions (D2D) We need to be able to synthesize all available data into information suitable for the “decision-makers” to rapidly & effectively make critical decisions ! The ratio ofavailable data torelevant data isgrowing rapidly f Focus on the tactical edge (dynamic, distributed, lack of infrastructure)
  7. 7. D2D Process Decide Question or Hypothesize DisseminateTask Process – Correlate, Fuse, Insure Relevance Exploit Forensics: Learned Data Collect – Query, Discover, Filter HUMINT National Assets Sensors Databases Soft Experts Cultural Understanding Networks Policies Learned Data Data ?
  8. 8. International Technology Alliance (ITA)  Joint US-UK research alliance with industrial, academic, government members  Conduct research to develop underpinning technology applicable to NCW and to enhance US & UK capability to conduct coalition warfare TECHNICAL FOCUS  MANET & MANET Security  Hybrid Networking & Security for Distributed Information Services  End-to-End Coalition Information Flows  Coalition Data-to-Decisions8 8
  9. 9. OUTLINE Background • Data-to-Decisions (D2D) • US UK International Technology Alliance (ITA) D2D Enablers • End-to-end frameworks • Technology enablers Controlled Natural Language (CNL) • Controlled English (CE) • Information Extraction and Integration via CE
  10. 10. ITA Framework: Sensor Fabric  Middleware infrastructure designed for ISR/ISTAR networks  Integrate and federate disparate assets and provide easy access to sensor, data & information across the network  Incorporate Policy Framework that implements and enforces Authorization and Obligation policies, protecting shared assets among coalition partnersJ. Wright, C Gibson, F. Bergamaschi, K. Marcus, R. Pressley, G. Verma, G Whipps, “ADynamic Infrastructure for Interconnecting Disparate ISR/ISTAR Assets (The ITA Sensor Fabric),” IEEE/ISIF Fusion 2009 Conference, Seattle, US, Jul 2009.
  11. 11. Persistent Wide Area Surveillance (PWAS) US Acoustic sensorsInteroperating & Sharing data with UK PWAS EO sensors (Pershore, UK, 16 March 2012, 16h34)
  12. 12. ITA Framework: Distributed Dynamic Federated Database (DDFD) Distributed Dynamic Federated  Small footprint, minimal Database (DDFD) or Gaian Database is based on “store locally and query administration cost ( ~4MB) from anywhere” principle  Distribution governed by a dynamic Query establishment of connections and N5 N7 application of configuration N4 updates on each node N6 N8  Network establishment using N3 autonomic discovery of neighboring nodes – configuration N9 N10 only required for exposed data Query N2 N0 sources  Federation of heterogeneous data Distributed formal policy based N1 techniques are used to control sources (RDBMS, Files, in-memory N11 access to data and the flow of tables, text indexes, data pre- data through the network processing, sensor data …)  Add semantics to distributed tables G. Bent , P. Dantressangle, D. Vyvyan, A. Mowshowitz and V. Mitsou, “Dynamic Distributed Federated Database,” ACITA 2008, London, UK, September 2008.
  13. 13. Policy Controlled DDFD for NATO Intelligence Fusion Center
  14. 14. ITA Policy Management Toolkit  Policy-based Management tools Policy Specification • Allow one to specify and manage policies at increasing levels of ease of specification from constrained natural language to  In Constrained Natural Language computer readable codes  In a Formal Language • Put constraints on how ISR assets can be used and shared to Transformation Refinement meet mission requirements  Policy examples Abstract Policy Models • Local Command & Control (C2), Platform Control, Sensor & Analysis  Goals and High Level Policies System Control, Sensor Information Access Control, Data Flow  In system context Protection, Information Extraction, etc. Transformation Refinement Operational Environment Management Environment Concrete Policy Sets Policies Negotiation Access control/Audit Analysis Policy Matched Policy  Data/user models  Risk model Decision w/Request Authoring  Choices and Consent Point  Info control flow Policy Policy Creation Transformation Refinement Decision Repository Requests Executable Policies Deployment  Databases Policy & Activation  Rule engines Policy Enforcement  XML stores ... Management Point T. Pham, G. Pearson, F. Bergamaschi, and S. Calo, “The ITA Sensor Fabric and Policy Management Toolkit,” 8th NATO MSS, Friedrichshafen, GER, May 2011.
  15. 15. ITA Sensor Assignment to Missions (SAM) SAM is a software tool for agile sensor-task assignment. SAM employs an extensible knowledge base of theoretical sensor-task suitability, based on known utility models and emerging ontology-based standards (e.g. SensorML). Linking tasks derived Missions Mission-and-Means Mission-and-Means from missions to assets Framework (MMF) Framework (MMF) derived from capabilities Operation Assigning specific assets optimally given state of Task Task Capability Capability sensors, ongoing missions, and mission priorities Platform Capability requirements to Accommodating energy UAV, Aerostat, UGV (Packbot) perform tasks to standard constraints and time dynamics, under given conditions System such as missions starting and Multi-modal UGS, Day/Nigh E/O ending, and deployment delays Component Acoustic, Seismic, Video, Low-power Radar Alun Preece, et al, “An Ontology-Centric Approach to Sensor-Mission Assignment,” EKAW 2008, Catania, Italy, September 2008.
  16. 16. OUTLINE Background • Data-to-Decisions (D2D) • US UK International Technology Alliance (ITA) D2D Enablers • End-to-end frameworks • Technology enablers Controlled Natural Language (CNL) • Controlled English (CE) • Information Extraction and Integration via CE
  17. 17. Controlled Natural Language & Controlled English Controlled English • Controlled English (CE) is a type of controlled natural language (CNL) • CNL is a subset of a natural language using a restricted set of grammar rules and a restricted vocabulary • Focus can be either for human readability or for machine readability CE addresses two critical information needs • Need for normalization and organization of free-text description • Need for domain expertise for  ITA CE is consistent with First specifying and extending the domain Order Predicate Logic model (ontology) • Based on Common Logic Challenge to balance naturalness Controlled English (Sowa 2007) and lack of ambiguity  Compatible syntax with existing ontology modeling languages P. Xue, D.Mott, D. Braines, S. Poteet, A. Kao, C. Giammanco, T. Pham, “Information Extraction using Controlled English to support Knowledge-Sharing and Decision-Making,” ICCRTS 2012, Fairfax, VA, June 2012.
  18. 18. ITA CE Architecture Concepts are defined usually as specializations of CE sentences take the forms other concepts. (1) Concepts, (2) Relationships, or (3) Logic Inference Rules & Definitions conceptualize a ~ platform type ~ P that is an asset type. conceptualize a ~ UAV ~ U that is a platform type. Relationships may be defined between concepts conceptualize an asset type A ~ is rated as ~ the NIIRS rating R and ~ provides ~ the capability C.
  19. 19. Human-Machine Interaction via CNL/CE  Unambiguous & machine processable  Human friendly • Easy to read, harder to write”  Universal syntax: • Model and rules • Facts • Queries  Extensible and Flexible  Rich Expressivity • Rationale and assumptions • Truth values & uncertainty • Layers and extensions • Inference capabilityControlled English: A human friendly, machine readable language
  20. 20. Structured & Unstructured Information Processing  Information expressed in Controlled English Unstructured /  Conforms to shared domain semi-structured model reports • Supports different user/community opinions & Info terminology Extraction  Information extraction fromControlled un/semi structured sources English • Automated use of domain model Knowledge concepts and terms base  Information exchange between Information & metrics human users / teams • Supported through terminology Field support matching in domain model (collection and query) Extracted information expressed in Controlled English
  21. 21. Integrating Hard/Soft Information Sources Use CNL/CE based approach for D2D activities (e.g., fusion and asset allocation) • Assets sourcing both Hard/Soft information are marked by triangles. • Passage of 2 vehicles causes a sequence of events (numbered on map). A. Preece,D. Pizzocaro, D. Barines, D. Mott, D., G. de Mel, T. Pham, “Integrating Hard and Soft Information Sources for D2D Using Controlled Natural Language,” FUSION 2012, Singapore, July 2012.
  22. 22. Ex. Automatic Fusion Patrol on North rd issues a semi-structured message, and a CE processing service produces: there is a vehicle named v01253 that has black saloon car as description and has black as colour and has saloon as body type and has ABC123 as registration. Message features are used to query stored sources for related info: there is a vehicle named v01253 that the person p670467 is known as has black saloon car as description and John Smith and has black as colour and has saloon as body type and + is a high value target and has ABC123 as registered vehicle. has ABC123 as registration. automatic fusion Asset matchingRationale there is a task named t327893 thatthere is a HVT sighting named h00453 that requires the intelligence capability localize and has the vehicle v01253 as target vehicle and automatic is looking for the vehicle v01253 and has the person p670467 as hvt candidate. allocation operates in the spatial area North Road and is ranked with the task priority high. UAV assigned
  23. 23. Ex. Semi-automatic Fusion UAV locates and starts to track the HVTs car: there is a tracking report named tr04658 that has the vehicle v01253 as target and has the person p670467 as candidate hvt and has stopped as current status and is located at the spatio-temporal point loc69543. Analyst wishes to be automatically alerted to significant changes and calls up imagery from the UAV and tags the image with CE: Notification Human analyst System (prev. in db) there is a vehicle named v01892 that there is a vehicle sighting named vs04514 that has red as colour and has SUV as body type and + observed the vehicle v01879 and has east as heading and is associated with the vehicle v01253. is located at the spatio-temporal point loc92453. other linked CE reports plate unavailable Semi-auto fusion vehicle is red SUV with plate XYZ789 the vehicle v01879 is the same as the vehicle v01892.http://cdn.c.photoshelter.com/img-get/I0000BBTXDQu4_hs/s/750/750/san-rafael-desert-SUV-hatchback-opening-sandstone-landscape- utah-2.jpg
  24. 24. UK Pathfinder Transitions PF I: IBM, Southampton, Logica CMG (2007/8) Objective: Develop an environment for intelligence analyst that addresses the lard data problem and extracting the task cross_bridge has 2 as minimum duration “the enemy is the other side of the bridge” the task destroy_enemy has 3 as minimum the task destroy_enemy has ’11:00’ as latest duration completion relevant information for intelligence reports Why do we the task cross_bridge has need fire • Use of Controlled English (CE) as “troops vulnerable on bridge” ’06:00’ as earliest start and has ’08:00’ as latest completion support between 06:00 and a means of representing extracted Why do we need fire 08:00? support information between 06:00 and 08:00? Series of UK Pathfinder transitions: the fire_support has ’06:00’ as earliest start and has ’08:00’ as latest completion‘ PF III: IBM (2011) • PF I: Demonstration of an environment for intelligence analysts • PF II: Extension of the PF I environment with focus on user-interface • PF III: Development of Natural Language Processing techniques for fact-extraction into CE Tested on classified Counter- Improvised Explosive Devices (C-IED) reports PF II: IBM, Logica CMG (2009/10)
  25. 25. Other Potential Usage of CNL/CE Explore the use of CNL/CE as a flexible and extensible framework for the common representation • Common lexicon to represent ISR data and information • Bridging the low-level and high-level fusion gaps • E.g., ARL-DIA Terra Harvest Explore and compare with NATO Command & Control Lexical Gram-mar (C2LG) language being proposed as a unifying framework for the common representation • Domain aspects (available context) • Low-level sensor data fusion results • HUMINT, OSINT soft data sources
  26. 26. Terra Harvest (TH)TH Architecture TH is a D2D framework that enables ISR interoperability, reconfiguration & insertion of new technologies in days/weeks
  27. 27. Contact Information Tien Pham, Ph.D. Networked Sensing & Fusion Branch Signal & Image Processing DivisionSensors & Electron Devices Directorate (SEDD) Email: tien.pham1.civ@mail.mil Tel: +1-301-394-4282

×