SlideShare a Scribd company logo
1 of 23
Download to read offline
Triaging Foreign Language Documents
for MEDEX
Brian Carrier
VP Digital Forensics

Basis Technology
Scenarios / Problem Statement
Media triage is performed in the field. Triage
reveals dozens of non-English documents. The
translator is busy talking with the suspect.
2. Medium-dive analysis is performed at a base.
Even more documents are found. Limited
translators are available.
1.

How does examiner / operator prioritize the
documents for the translator?
Ideal Solution: Translated Gist
▪ A several page non-English document turns into

an English executive summary.
▪ Allow user to understand who, what, and where
are mentioned.
▪ No one provides that solution today.
Our Proposed 70% Solution
▪ Show human generated gists when they are known.
▪ Use Rosette Named Entity software to find names of
people, places, and organizations:
– Who and where
▪ Use name matching software to identify people on
watch lists.
▪ Use dictionaries to find concepts (financial, drugs,
IED).
– What
▪ Use graphical techniques to show relationships and
context.
Names
▪ Rosette® Entity Extractor:
– Uses statistical models, regular expressions, and
gazetteers to find names.
– Works on 17 languages.
▪ Rosette® Name Translator:
– Translates names from native language to English.
– Uses linguistic algorithms, dictionaries, and statistical
inference.
Concept Dictionary
▪ User generated dictionary based on concepts

that are important to them.
▪ Contains both native word and English words.
▪ Text in documents are normalized using Rosette
Base Linguistics.
▪ Concepts are identified in native or English.
Navigation Techniques
▪ Goals:
– Provide summary of names and concepts.
– Provide context to know what was mentioned nearby.
▪ This is an area of research to find an approach

that works best.
Concise, but no context
Prototype Interface 1
Prototype Interface 2
Deployment Platform
▪ Autopsy™ is an open source digital forensics

platform.
▪ Development started after our first Open Source
Digital Forensics Conference (OSDFCon) in 2010.
▪ Community wanted an end-to-end platform
instead of many stand-alone tools.
▪ Version 3.0 was released in September 2012.
▪ Received some US Army funding.
Autopsy 3
Autopsy Capabilities
▪ Ingests hard drives, media cards, and other digital
media.
▪ Identifies suspicious files based on:
– Keywords
– Hash databases
– File types
▪ Allows operator to quickly focus on recent user
activity:
– Web artifacts
– E-mail
▪ Provides fast results to enable field-based scenarios.
Autopsy Extensibility
▪ Ingest Modules analyze media on import
– Hash analysis, keyword search, registry, web artifacts
▪ Content viewers display files
– Text, image, text analytics, video triage, …
▪ Report modules generate final reports
– HTML, XML, …
Text Gisting Module
Another Module Example
Scenario: USB-based Triage
▪ USB drive from media triage.
– Logical files are added to Autopsy.
– User can navigate all documents and images.
Review with Text Gist module
Tag High Priority Files
Translator Focuses on Tagged Files
Scenario 2: Medium Dive
▪ Media card, hard drive, or cell phone are added.
▪ File system is analyzed.
▪ User navigates media using:
– Hash lookup
– Keyword search
– Web browser activity
– E-mail analysis
▪ Uses triage module to evaluate documents as

they are found.
▪ Uses tags to flag priority files.
80% Solution
▪ Entity resolution integration.
▪ Topic classifiers.
▪ More advanced analysis relating concepts and

entities.
▪ More advanced interface approaches.
Questions?

Brian Carrier
VP of Digital Forensics
Basis Technology
617-386-2000

More Related Content

Viewers also liked

OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierBasis Technology
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceBasis Technology
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadSimple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadBasis Technology
 
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David MurgatroydHLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David MurgatroydBasis Technology
 
Assignment for week 4 mcbride
Assignment for week 4 mcbrideAssignment for week 4 mcbride
Assignment for week 4 mcbride7jackdarren
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Basis Technology
 
Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceBasis Technology
 
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics PlatformAutopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics PlatformBasis Technology
 
Verslag ontkiemen
Verslag ontkiemenVerslag ontkiemen
Verslag ontkiemensveetje
 
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in JapanBasis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in JapanBasis Technology
 
Campus Performace Report
Campus Performace ReportCampus Performace Report
Campus Performace ReportSayed Ali
 
Individual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- SampleIndividual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- SampleSayed Ali
 
Campus New Proposal.
Campus New Proposal.Campus New Proposal.
Campus New Proposal.Sayed Ali
 

Viewers also liked (15)

OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
 
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology ConferenceA Lightning Introduction To Clouds & HLT - Human Language Technology Conference
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadSimple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
 
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David MurgatroydHLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
HLT 2013 - From Research to Reality: Advances in HLT by David Murgatroyd
 
Assignment for week 4 mcbride
Assignment for week 4 mcbrideAssignment for week 4 mcbride
Assignment for week 4 mcbride
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
 
Autopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics ConferenceAutopsy 3.0 - Open Source Digital Forensics Conference
Autopsy 3.0 - Open Source Digital Forensics Conference
 
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics PlatformAutopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
Autopsy 3: Free Open Source End-to-End Windows-based Digital Forensics Platform
 
Verslag ontkiemen
Verslag ontkiemenVerslag ontkiemen
Verslag ontkiemen
 
Basis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in JapanBasis Technology showcase at elasticsearch meetup in Japan
Basis Technology showcase at elasticsearch meetup in Japan
 
Patagonia
PatagoniaPatagonia
Patagonia
 
Folleto rehabilitacion cardiaca 3.2
Folleto rehabilitacion cardiaca 3.2Folleto rehabilitacion cardiaca 3.2
Folleto rehabilitacion cardiaca 3.2
 
Campus Performace Report
Campus Performace ReportCampus Performace Report
Campus Performace Report
 
Individual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- SampleIndividual Student Feedback Diagnostic Report- Sample
Individual Student Feedback Diagnostic Report- Sample
 
Campus New Proposal.
Campus New Proposal.Campus New Proposal.
Campus New Proposal.
 

Similar to HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

"Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption" "Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption" J T "Tom" Johnson
 
OSIT fall in-person meet up - october 3, 2018
OSIT fall in-person meet up - october 3, 2018OSIT fall in-person meet up - october 3, 2018
OSIT fall in-person meet up - october 3, 2018Doug Koster
 
ICT in teacher education, fundamentals of computer
ICT in teacher education, fundamentals of computerICT in teacher education, fundamentals of computer
ICT in teacher education, fundamentals of computerSHARMA EDUCATION
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainChristian Martorella
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the ArchiveGarethKnight
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyAdrian Olszewski
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyAdrian Olszewski
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedarcomem
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - IntroductionAlex Meadows
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantinimaxfalc
 
Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)softwaresatish
 
Evaluation of Research Tools
Evaluation of Research ToolsEvaluation of Research Tools
Evaluation of Research ToolsHATS
 
Get Started With Python Language.pdf
Get Started With Python Language.pdfGet Started With Python Language.pdf
Get Started With Python Language.pdfCerebrum Infotech
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)Shivani Rai
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionMohammad Ilyas Malik
 
Data Ninja Services: empowering data science workflows with text analytics
 Data Ninja Services: empowering data science workflows with text analytics Data Ninja Services: empowering data science workflows with text analytics
Data Ninja Services: empowering data science workflows with text analyticsData Ninja API
 

Similar to HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier (20)

"Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption" "Using Web 2.0 as a Weapon Against Corruption"
"Using Web 2.0 as a Weapon Against Corruption"
 
OSIT fall in-person meet up - october 3, 2018
OSIT fall in-person meet up - october 3, 2018OSIT fall in-person meet up - october 3, 2018
OSIT fall in-person meet up - october 3, 2018
 
Autopsy Digital forensics tool
Autopsy Digital forensics toolAutopsy Digital forensics tool
Autopsy Digital forensics tool
 
ICT in teacher education, fundamentals of computer
ICT in teacher education, fundamentals of computerICT in teacher education, fundamentals of computer
ICT in teacher education, fundamentals of computer
 
Css- 2nd quarter.pptx
Css- 2nd quarter.pptxCss- 2nd quarter.pptx
Css- 2nd quarter.pptx
 
FLOSS development
FLOSS developmentFLOSS development
FLOSS development
 
A fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP SpainA fresh new look into Information Gathering - OWASP Spain
A fresh new look into Information Gathering - OWASP Spain
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the Archive
 
Meet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journeyMeet a 100% R-based CRO. The summary of a 5-year journey
Meet a 100% R-based CRO. The summary of a 5-year journey
 
Meet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journeyMeet a 100% R-based CRO - The summary of a 5-year journey
Meet a 100% R-based CRO - The summary of a 5-year journey
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Tovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio CostantiniTovek Presentation 2 by Livio Costantini
Tovek Presentation 2 by Livio Costantini
 
Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)
 
Evaluation of Research Tools
Evaluation of Research ToolsEvaluation of Research Tools
Evaluation of Research Tools
 
Get Started With Python Language.pdf
Get Started With Python Language.pdfGet Started With Python Language.pdf
Get Started With Python Language.pdf
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
 
Data Ninja Services: empowering data science workflows with text analytics
 Data Ninja Services: empowering data science workflows with text analytics Data Ninja Services: empowering data science workflows with text analytics
Data Ninja Services: empowering data science workflows with text analytics
 

More from Basis Technology

Product Update: Customization with Rosette
Product Update: Customization with RosetteProduct Update: Customization with Rosette
Product Update: Customization with RosetteBasis Technology
 
Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020Basis Technology
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Basis Technology
 
Rosette Product Update (May 2019)
Rosette Product Update (May 2019)Rosette Product Update (May 2019)
Rosette Product Update (May 2019)Basis Technology
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetupBasis Technology
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLRBasis Technology
 
Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014Basis Technology
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchBasis Technology
 
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff GodboldHLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff GodboldBasis Technology
 
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson MarguliesOSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson MarguliesBasis Technology
 
Big Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology ConferenceBig Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology ConferenceBasis Technology
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceBasis Technology
 

More from Basis Technology (12)

Product Update: Customization with Rosette
Product Update: Customization with RosetteProduct Update: Customization with Rosette
Product Update: Customization with Rosette
 
Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020Smart Matching for Screening Webinar - May 2020
Smart Matching for Screening Webinar - May 2020
 
Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020Understanding Names with Neural Networks - May 2020
Understanding Names with Neural Networks - May 2020
 
Rosette Product Update (May 2019)
Rosette Product Update (May 2019)Rosette Product Update (May 2019)
Rosette Product Update (May 2019)
 
Simple fuzzy name matching in elasticsearch paris meetup
Simple fuzzy name matching in elasticsearch   paris meetupSimple fuzzy name matching in elasticsearch   paris meetup
Simple fuzzy name matching in elasticsearch paris meetup
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
 
Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014Gregor Stewart - OSIRA 2014
Gregor Stewart - OSIRA 2014
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
 
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff GodboldHLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
HLT 2013 - Big Data Navigation and Discovery by Stefan Andreasen & Jeff Godbold
 
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson MarguliesOSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
OSS 2013 - Real World Facets with Entity Resolution by Benson Margulies
 
Big Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology ConferenceBig Data Triage with Rosette Human Language Technology Conference
Big Data Triage with Rosette Human Language Technology Conference
 
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search ConferenceMultilingual Search and Text Analytics with Solr - Open Source Search Conference
Multilingual Search and Text Analytics with Solr - Open Source Search Conference
 

Recently uploaded

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

  • 1. Triaging Foreign Language Documents for MEDEX Brian Carrier VP Digital Forensics Basis Technology
  • 2. Scenarios / Problem Statement Media triage is performed in the field. Triage reveals dozens of non-English documents. The translator is busy talking with the suspect. 2. Medium-dive analysis is performed at a base. Even more documents are found. Limited translators are available. 1. How does examiner / operator prioritize the documents for the translator?
  • 3. Ideal Solution: Translated Gist ▪ A several page non-English document turns into an English executive summary. ▪ Allow user to understand who, what, and where are mentioned. ▪ No one provides that solution today.
  • 4. Our Proposed 70% Solution ▪ Show human generated gists when they are known. ▪ Use Rosette Named Entity software to find names of people, places, and organizations: – Who and where ▪ Use name matching software to identify people on watch lists. ▪ Use dictionaries to find concepts (financial, drugs, IED). – What ▪ Use graphical techniques to show relationships and context.
  • 5. Names ▪ Rosette® Entity Extractor: – Uses statistical models, regular expressions, and gazetteers to find names. – Works on 17 languages. ▪ Rosette® Name Translator: – Translates names from native language to English. – Uses linguistic algorithms, dictionaries, and statistical inference.
  • 6. Concept Dictionary ▪ User generated dictionary based on concepts that are important to them. ▪ Contains both native word and English words. ▪ Text in documents are normalized using Rosette Base Linguistics. ▪ Concepts are identified in native or English.
  • 7. Navigation Techniques ▪ Goals: – Provide summary of names and concepts. – Provide context to know what was mentioned nearby. ▪ This is an area of research to find an approach that works best.
  • 8. Concise, but no context
  • 11. Deployment Platform ▪ Autopsy™ is an open source digital forensics platform. ▪ Development started after our first Open Source Digital Forensics Conference (OSDFCon) in 2010. ▪ Community wanted an end-to-end platform instead of many stand-alone tools. ▪ Version 3.0 was released in September 2012. ▪ Received some US Army funding.
  • 13. Autopsy Capabilities ▪ Ingests hard drives, media cards, and other digital media. ▪ Identifies suspicious files based on: – Keywords – Hash databases – File types ▪ Allows operator to quickly focus on recent user activity: – Web artifacts – E-mail ▪ Provides fast results to enable field-based scenarios.
  • 14. Autopsy Extensibility ▪ Ingest Modules analyze media on import – Hash analysis, keyword search, registry, web artifacts ▪ Content viewers display files – Text, image, text analytics, video triage, … ▪ Report modules generate final reports – HTML, XML, …
  • 17. Scenario: USB-based Triage ▪ USB drive from media triage. – Logical files are added to Autopsy. – User can navigate all documents and images.
  • 18. Review with Text Gist module
  • 20. Translator Focuses on Tagged Files
  • 21. Scenario 2: Medium Dive ▪ Media card, hard drive, or cell phone are added. ▪ File system is analyzed. ▪ User navigates media using: – Hash lookup – Keyword search – Web browser activity – E-mail analysis ▪ Uses triage module to evaluate documents as they are found. ▪ Uses tags to flag priority files.
  • 22. 80% Solution ▪ Entity resolution integration. ▪ Topic classifiers. ▪ More advanced analysis relating concepts and entities. ▪ More advanced interface approaches.
  • 23. Questions? Brian Carrier VP of Digital Forensics Basis Technology 617-386-2000