1. Flexible and Intelligent Access to InformationColumboDiscovery™ and ColumboForensics™ June 2010 1
2. Contents Context Information Challenges Our Approach to Information Discovery The Technologies we Use Columbo®Information Discovery Platform Automatic Entity Extraction Themes and Links Forensics Case Study Summary and benefits 2
3. Context Economic pressures versus increasing demand Increasing technical sophistication of criminal and terrorist Criminal investigations versus digital investigations Information technology – failure of expectations Shortcomings of software - 8 years on from Soham LEA working together? 3
4. Information Challenges Massive data volumes Petabytes + of data (1 Petabyte (1000 terabytes) = approx 3000 million documents Forensics analysis of hundreds of devices on large cases Diverse sources The Internet – www, blogs, twitter, social networks, virtual worlds, chat-rooms Internal – mail, office systems, intelligence databases, operational systems Third-party databases such as ISPs and Telcos Computers, storage devices, mobile phones, cameras, sat-navs, Wi Intelligence from other law enforcement agencies Integration of data in multiple formats Structured, unstructured (text), multi-media (image, voice, video) Deleted and hidden Languages / alphabets Dangers and shortcomings of search Search engine issues – ranking, relevance etc. Terminology, expert knowledge of subject…. Can distort investigative approach Spellings / miss-spellings 4
6. Our Approach to Information Discovery Help the user to understand and explore the content We identify entities, themes (subjects), links – in most cases automatically People, Places, Objects, Account Numbers, Telephone Numbers, etc. Themes, Concepts, Sentiment Hard and soft (weak) links between Entities and Themes We present this in ways that help users understand and explore (discover) the data Entity /Theme Extractions Summaries Timelines and Graphs Connection and Relationship Diagrams Geo-location Maps Intelligent search Prompted Sounds like / spelt like Semantic (find similar content to this) Automate processes including reports, where possible 6
7. Some of the Technologies we use We use advanced analysis techniques that result in much better conceptual understanding and forensic performance These techniques include using semantic indexing and linking and more novel proprietary ‘digital fingerprint’ techniques (CSI – Columbo® Semantic Indexing) Our platform is scalable and our techniques are geared to indexing and comparing massive amounts of information – many ‘discovery’ requirements are a numbers and speed game Our platform can be trained to recognise certain patterns where appropriate (both text and image based), and can run autonomously and covertly if required A key difference is that our solutions ‘turn search upside down’ We get the data to tell us what is there, rather than just looking for something specific We don’t search for the needle hidden in the haystack – we remove the hay and find the needle together with whatever else might be there Gillc has a number of products and applications, the main one of which is ColumboDiscoverytm, our integrated information discovery platform 7
9. Automatic Entity Extraction All structured and unstructured information resources can be automatically processed for entity extraction, including: Documents – including web pages, social media, office applications, email, databases Digital devices – cameras, phones, SIM cards, storage devices The entity types shown (left) are a selection of those already coded into Columbo® software. Others could include for example: Airports and airlines Known street gangs Additional types can be added by Gillc or added as Custom types by the end user Metadata from applications, image files and digital devices is also extracted as entity information. For example: Device type and ID – for phones, cameras, computers etc. Author and creation date – for enterprise documents etc. Entity classification is customisable, and includes various identification and matching techniques, for example: Detect entities where slang, codes or ‘street names’ are used Detect entities where there are multiple spellings Detect complex /variable formats – e.g. phone numbers, dates 9
10. Themes and Links Themes and Classification Themes and sub-themes are automatically identified from textual resource information Various techniques are used for theme deduction Various techniques are used for image classification / identification Links Hard and soft links can be identified or uncovered by interacting with the information within Columbo® Hard links show direct links between entities, entities and themes, and themes Soft links (or weak links) can be identified by: Analysing the presence/popularity of entities and themes in different resources/devices Using Columbo® Semantic Indexing (CSI) to identify varying levels of link strength CSI is also used for linking / categorising images 10
11. ColumboForensicstm – case study 11 X 4 X 4 X 3 X 10 Suspect 4 X 7 X 9 Suspect 3 Suspect 5 X 2 X 5 Suspect 2 Suspect 6 X 2 X 2 Suspect 7 Suspect 1 X 4
12. Forensics Process 12 Image Process E01 Image Process Suspect One Image Process Pro-active Comparison Between Suspects Image Process Indexing and Analysis Gathering Image Process E01 Image Process E01 Suspect Two Image Process E01 Image Process 3 days (7 suspects, 22 phones, 37 computers) (Existing search driven approach requires each device to be analysed separately – estimate of 55- 75 days)
14. Some other Law Enforcement considerations All necessary security features including: Multiple protection levels Security at document, entity and word level – extensive audit trail options Can build case / suspect ‘databases’ allowing: Intra-case analysis Cross-case analysis Suspect consolidation whilst retaining case integrity Secure links between agencies could allow controlled comparison of content Performant Quick response times and turn-around offers real opportunity to change processes Potential for comprehensive but rapid tri-age 14
15. Summary and Benefits The Columbo® group of products are powerful, next generation information discovery applications Columbo® applications are tailored towards ‘discovery’, as opposed to ‘search’ Search implies that the user already knows what to look for Discovery allows the data to identify what may be relevant, and allows the user to interact with it in order to find the information contained within it The software delivers significant efficiency savings, by both rapidly finding relevant data and automating much of the process including reporting The software enhances effectiveness, automatically compares content and incrementally builds an intelligence repository Columbo® is “implementation-lite” and has capacity to readily link diverse agencies together, sharing and collaborating critical data as appropriate 15