Mass declassification sept 23 2010v2.1


Published on

My public presentation as delivered to the Public Interest Declassification Board (PIDB) trying to determine the best way to declassify and release over 400M classified documents.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high. Remember, natural language is ambiguous, polysemous, tacit and its meaning is often highly contextual. Bottom line -- the computer needs to consider many possible meanings, attempting to find the inference paths that are most confidently supported by the data. The primary computational principle supported by the DeepQA architecture is to assume and maintain multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and process many different evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed from the outset to support interoperability across independently developed analytics. For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community and now an Apache Project ( Over 100 different algorithms, implemented as UIMA components, were developed, advanced and integrated into this architecture to build Watson . In the first step, Question and Category analysis , parsing algorithms decompose the question into its grammatical or syntactic components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”. In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system. In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. These searches are performed over a combination of unstructured data, natural language documents, and structured data, available knowledge bases. The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is not a lot of confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus is on generating a broad set of hypotheses, – or for this application what we call “Candidate Answers”. To implement this step for Watson we used multiple open-source text and KB search components. DeepQA, acknowledges that resources are ultimately limited. And some parameterized judgment about which candidate answers are worth pursuing further must be made given constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and latency, DeepQA uses soft filtering -- it uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the filter would be eliminated from consideration entirely at this point. In Hypothesis & Evidence Scoring the candidate answers are scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”? Evidence , in this case, more documents, passages and more structured facts, are collected for the many candidate answers. Each of these pieces of evidence are subjected to many independently developed algorithms that deeply analyze the evidentiary passages, for example, and score the likelihood that the passage supports or refutes the correctness of the candidate answer. In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire, with varying levels of certainty, They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts. Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----------------------- The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure. No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, 1 month later might only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops is regularly trained, evaluated and retrained. DeepQA is a complex system architecture designed to incrementally extend both in data and algorithms to deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system. -David A. Ferrucci
  • Mass declassification sept 23 2010v2.1

    1. 1. Mass Declassification What If? Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] September 23, 2010
    2. 2. The Ask <ul><li>What emerging technology or innovative approaches come to mind … which may have applicability to this task? </li></ul><ul><li>Use your imagination. What if? </li></ul><ul><li>Not talking about any specific products </li></ul><ul><li>Not focusing on the widely available COTS/GOTS technologies (OCR, document management, case management, workflow, etc.) </li></ul>
    3. 3. The Problem at Hand <ul><li>Volumes may be beyond human, brute force review (@5min/ea = 18,382 FTEs) </li></ul><ul><li>Necessitates some form of machine triage </li></ul><ul><ul><li>Red: A disclosure risk </li></ul></ul><ul><ul><li>Yellow: A possible disclosure risk </li></ul></ul><ul><ul><li>Green: No disclosure risk </li></ul></ul><ul><li>Reliable machine triage requires substantially better prediction systems </li></ul><ul><li>Even then, advanced means for humans to deal with the remaining large volumes of “possibles” is still required </li></ul>
    4. 4. Background <ul><li>Early 80’s: Founded Systems Research & Development (SRD), a custom software consultancy </li></ul><ul><li>1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA) </li></ul><ul><li>2001/2003: Funded by In-Q-Tel </li></ul><ul><li>2005: IBM acquires SRD </li></ul><ul><li>Cumulatively: I have had a hand in a number of systems with multi-billions of rows describing 100’s of millions of entities </li></ul><ul><li>Affiliations: </li></ul><ul><ul><li>Member, Markle Foundation Task Force on National Security in the Information Age </li></ul></ul><ul><ul><li>Senior Associate, Center for Strategic and International Studies (CSIS) </li></ul></ul><ul><ul><li>Distinguished Research Faculty (adjunct), Singapore Management University, School of Information Systems </li></ul></ul><ul><ul><li>Member, EPIC advisory board </li></ul></ul><ul><ul><li>Board Member, US Geospatial Intelligence Foundation (USGIF), the GEOINT organizing body </li></ul></ul>
    5. 5. In Today’s Session <ul><li>Intro to context accumulating systems </li></ul><ul><li>Predictions and data points needed for mass declassification </li></ul><ul><li>Strawman architecture </li></ul><ul><li>Challenges </li></ul><ul><li>Q&A </li></ul>
    6. 6. Context Accumulating Systems
    7. 7. From Pixels to Pictures to Insight Observations Context Relevance Consumer (An analyst, a system, the sensor itself, etc.) Contextualization
    8. 8. <ul><li>Context, definition of: </li></ul><ul><li>Better understanding something by taking into account the things around it. </li></ul>
    9. 9. Without Context [email_address]
    10. 10. Consequences <ul><li>Algorithms flat-lining (e.g., alert queues) </li></ul><ul><li>Enterprise amnesia on the rise </li></ul><ul><li>Overwhelmed by false positives and false negatives? You have seen nothing yet </li></ul><ul><li>Not enough humans to fix this with brute force </li></ul><ul><li>Risk assessment becomes the risk </li></ul>
    11. 11. Context Accumulation Trusted Supplier Job Applicant Stolen Identity Known Terrorist [email_address]
    12. 12. Puzzle Metaphor Primer <ul><li>Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors </li></ul><ul><li>What it represents is unknown – there is no picture on hand </li></ul><ul><li>Is it one puzzle, 15 puzzles, or 1,500 puzzles? </li></ul><ul><li>Some pieces are duplicates and some are missing </li></ul><ul><li>Some are pieces are incomplete, low quality, or have been misinterpreted </li></ul><ul><li>Some pieces may even be professionally fabricated lies </li></ul><ul><li>Until you take the pieces to the table, you don’t know what you are dealing with </li></ul>
    13. 13. How Context Accumulates <ul><li>With each new observation … one of three assertions are made: 1) Un-associated; 2) near like neighbors; or 3) connections </li></ul><ul><li>Asserted connections must favor the false negative </li></ul><ul><li>New observations sometimes reverse earlier assertions </li></ul><ul><li>Some observations produce novel discovery </li></ul><ul><li>As the working space expands, computational effort increases </li></ul><ul><li>The emerging picture helps focus collection interests </li></ul><ul><li>Given sufficient observations, there can come a tipping point </li></ul><ul><li>Thereafter, confidence improves while computational effort decreases!!!! </li></ul>
    14. 14. False Negatives Overstate The Universe Observations Unique Identities True Population
    15. 15. Counting Is Difficult Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2
    16. 16. The Rise and Fall of a Population Observations Unique Identities True Population
    17. 17. Data Triangulation Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2 Mark Randy Smith 443-43-0000 DL: 00001234 New Record
    18. 18. Increasing Accuracy and Performance Observations Unique Identities True Population
    19. 19. “ Expert Counting” is Fundamental to Prediction <ul><li>Is it 5 people each with 1 account … or is it 1 person with 5 accounts? </li></ul><ul><li>If one cannot count … one cannot estimate vector or velocity (direction and speed). </li></ul><ul><li>Without vector and velocity … prediction is nearly impossible. </li></ul><ul><li>Therefore, if you can’t count, you can’t predict. </li></ul>
    20. 20. Mass Declassification Predictions
    21. 21. Mass Declassification Predictions <ul><li>Whose equity is it? </li></ul><ul><li>Machine triage – disposition </li></ul><ul><li>Queue prioritization </li></ul>
    22. 22. Using What Data Points? <ul><li>FOR EXAMPLE: </li></ul><ul><li>450M target documents </li></ul><ul><li>Dirty words </li></ul><ul><li>Previous declassifications </li></ul><ul><li>Previous declassification denials </li></ul><ul><li>FOIA’s </li></ul><ul><li>Intellipedia </li></ul><ul><li>Wikipedia </li></ul><ul><li>WikiLeaks </li></ul><ul><li>Deceased persons </li></ul><ul><li>Publically available accounts/facts </li></ul>
    23. 23.
    24. 24. Open Source Discovery/Scoring <ul><li>“ Height of Pakistan’s Mufasa missile.” </li></ul><ul><ul><ul><li>What is 15.5 meters? </li></ul></ul></ul><ul><ul><ul><ul><li>New York Times, Sept 21, 2010, C3 </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>“ Pakistan unveils Mufasa 7 Warhead” </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Wikipedia: Mufasa_7_Warhead </li></ul></ul></ul></ul>
    25. 25. Context Accumulation FOIA March 2010 Open Source Reference Dirty Word Classified – Asserted Mufasa 7 Warhead
    26. 26. Context Accumulation + Statistics <ul><li>Document Element Total | Declass | Class-Default | Class-Asserted </li></ul><ul><li>Author: “Billy K” 4503 1600 403 0 </li></ul><ul><li>Codeword: “Tomatoe” 4818 4600 218 0 </li></ul><ul><li>Classification: “SI/TK/001” 23 22 1 0 </li></ul><ul><li>Actors: “Salam Ahmed” 782 700 82 0 </li></ul>Declassification dispositions … becoming a force multiplier. The more human dispositions, the more automated dispositions. Human Triage Auto Triage 5,000 20 10,000 4,000 100,000 65,000 1,000,000 17,000,000
    27. 27. Policy Questions <ul><li>What related information is already available in the public domain? </li></ul><ul><ul><li>Evidence: Exists in open source </li></ul></ul><ul><li>What damage might conceivably result from disclosure and what benefits might ensue? </li></ul><ul><ul><li>Evidence: Same text already released (by same equity holder) </li></ul></ul>
    28. 28. Strawman Architecture
    29. 29. Strawman Architecture 450M Docs Historical Dispositions DirtyWords Etc. Feature Extraction & Classification Context Accumulation Predictions(*) Workflow System (*) Recommendations: Equity of, Disposition, Priority Dispositions
    30. 30. Another Idea: Crowd Sourcing <ul><li>Can you predict specific people with privileges and knowledge … to whom can be routed selected documents for evaluation? </li></ul><ul><li>Can you publish machine-triage recommendations to a wiki or other form of internal broadcast for community crowd sourcing? </li></ul>
    31. 31. Another Idea: Better Classification <ul><li>Using the overall declassification platform to assist in proper classification (real-time) </li></ul><ul><li>And, better pre-tagging to assist in future auto-declassification </li></ul>
    32. 32. Challenges
    33. 33. Challenges <ul><li>Entity extraction is imperfect </li></ul><ul><li>Predictions may still not good enough, often enough </li></ul><ul><li>Not in English </li></ul><ul><li>The user work surface and its distribution </li></ul><ul><li>Consequences of an inappropriate release </li></ul><ul><li>With super access and super tools, this may call for stronger audit and insider-threat protections </li></ul><ul><li>Your contracting cycle and the creation of the system might take until mid-2011 or 2012 or 2013 </li></ul>
    34. 34. Closing Thoughts
    35. 35. Closing Thoughts <ul><li>Contextualization is essential to better prediction </li></ul><ul><li>There are not enough humans to ask every question every day </li></ul><ul><li>“ Human attention directing” systems are critical to the mission </li></ul><ul><li>The data must find the data, the relevance must find the user </li></ul>
    36. 36. Worst Case Scenario <ul><li>Rich context enables better hints for users, results in faster dispositions </li></ul><ul><li>Rich context enables improved sequencing of the work </li></ul>
    37. 37. Related Blog Posts <ul><li>Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems </li></ul><ul><li>Data Finds Data </li></ul><ul><li>Puzzling: How Observations Are Accumulated Into Context </li></ul><ul><li>The Fast Last Puzzle Piece </li></ul><ul><li>Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel </li></ul><ul><li>How to Use a Glue Gun to Catch a Liar </li></ul><ul><li>It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You </li></ul><ul><li>Smart Systems Flip-Flop </li></ul>
    38. 38. Blogging At: Information Management Privacy National Security and Triathlons Questions?
    39. 39. Mass Declassification What If? Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] September 23, 2010