AI and Data, for Good

AI and Data, for Good
Naveen Ashish
InferLink Corporation
March 28th, 2019
Florida International University (FIU) , Computing & Information Sciences Lecture

InferLink
• Founded 2011
• Post Fetch Technologies
• Roots in USC/ISI
• R&D, Tools, Solutions, Spin-offs

Model
———— ————

Talk Organization
R&D
Tools
Solutions
Active Search
Complex Information Linkage
(for law enforcement)
RSX
Spin-offs
MachineReading
AutoScience
Evid Science
OpenAI
AI Resources

Information Retrieval: Active
Search

ActiveSearch: Background
• ISS Example: Find recent documents
mentioning Canada and Islamist
Extremist Groups (e.g., Report Desk
documents)
• AFIA Example: Find Russian aircraft
mentioned in Central Africa (e.g., on
Jetphotos.com)
• Cyber Intelligence example: Find reports
of web browsers with cross-site scripting
vulnerabilities
• ISS TopicBuilder example: Find articles
on “Al-Qaeda in the Arabian
Peninsula” (“Ansar al-Sharia”)
• information retrieval :
broad coverage, too
general, no notion of
entities, relations,
events, etc.
• information extraction :
notion of entities and
concepts, but too
specific, needs
customization, hard
failures

ActiveSearch
• A research engine
• Not one-size-fits-all (Google)
• Take advantage of current natural language technology
• Plug and play
• Works out-of-the-box
• Immediate value, rapid response
• Easy to customize to a domain

ActiveSearch Project
Multiple Extractors
(Plug and Play)
Stanford NER
Term extractor
Resolvers
Entity Res
(EntityBase)
Concept Res
(CP)
Indexer
Integrated Ontology
Documents
§ NLP + Massive Ontology

Step 1: Better Keyphrase Search

Concept completion Generalization / Specialization

Superfacets
Mixed queries Integration with ISS topic builder

ActiveSearch Use Case: Cytenna
• Analysts want to identify patterns in the
vulnerabilities and exploits
• What type of software is being exploited?
• Web servers, browsers, operating
systems, etc.
• What types of attacks are being committed?
• Denial of service, buffer overflow, XSS,
etc.
• More powerful search technologies are
needed to collect the data for pattern
analysis
“ransomware” as a concept

DHS Application: Dataset Search

Missing persons search
Human trafficking prevention
Preventing unlawful weapon sales

1,973 “James Rodriguez” in California
People Search

Find sellers who don’t require
paperwork or a federal firearms
license

Profits per Year: $32 Billion
Average Age of Entry To Prostitution in the US: 14
PIMP’s Profit Per Victim Per Year: $150,000
Advertising Budget On the Web: $45 Million
Human Trafficking in the US
Find the locations where a potential victim
of human trafficking was advertised

Understand Page Content
Find the locations where a potential victim  
of human trafficking was advertised

HTML
JSONSemi-Structured Data Extraction

“YOU don't wanna miss out
on ME :) Perfect lil booty
Green eyes Long curly black
hair Im a Irish,Armenian and
Filipino mixed princess :) ❤
Kim ❤ 7○7~7two7~7four77
❤ HH 80 roses ❤ Hour 120
roses ❤ 15 mins 60 roses”
Text Extraction
name: Kim
eye-color: green
hair-color: black
phone: 707-727-7477
rate: $60/15min $80/30min
$120/60min

Connect Graphs on Strong Attributes

Connecting Nodes Using All Attributes

same victims
same Trafficker

same victims
same Trafficker
Approach: K-Partite Graph Co-Clustering

Deployed to Law Enforcement:
Connect the Dots
Mary Lucy
222-0000 777-0000
Police Database
Bad Guy:
777-0000

RSX
https://rsx.inferlink.com/

PRODUCT
10,000 papers published per day
DATA
SOURCES:
JUST TO KEEP UP, ASSUMING 5 MINUTES/PAPER
= 24 SCIENTISTS READING 24 HOURS PER DAY
Example
Congress
Abstracts:
Volumes of Data … Per Day

PRODUCT
Guselkumab was also superior (P < .001) to adalimumab for Investigator Global Assessment
0/1 and PASI 90 at week 16 (85.1% vs 65.9% and 73.3% vs 49.7%), week 24 (84.2% vs
61.7% and 80.2% vs 53.0%), and week 48 (80.5% vs 55.4% and 76.3% vs 47.9%).
What do these refer to?
To our AI, it looks like this… but, instantly (25,000,000 articles per hour)
Intervention Outcome Measurment
guselkumab Investigator Global Assessment 0/1 at week 48 80.5%
adalimumab Investigator Global Assessment 0/1 at week 48 55.4%
AI: Automated Reading

PRODUCT
Machine Learning All Around

Evid Science
• https://evidscience.com/product/

ActiveSearch for Medical Entities

AI Community
• Journal of Artificial Intelligence Research
• https://jair.org
• AI Access
• AI Resources

We did not cover :)
R&D
Tools
Solutions
Active Search
RSX
Spin-offs
MachineReading
AutoScience
Evid Science
OpenAI
AI Resources
EntityBase
ConnectD
OpenWatch
CodeFault
TBN

Active Interests in AI & Data Science
• Health informatics and Biomedical research
• Cybersecurity
• Homeland security, emergency and disaster response
• Data driven engineering design
• Precision agriculture, E-Governance
• Transportation safety
• ….

Acknowledgements
• Dr. Steven Minton, President and CEO InferLink
• Dr. Greg Barish, CTO InferLink & CEO Cytenna
• Dr. Matt Michelson, CEO Evid Science
• Dr. Pedro Szekely, Research Associate Professor USC/ISI

www.inferlink.com
www.cytenna.com
www.evidscience.com
nashish@inferlink.com
thank you !

AI and Data, for Good

Recommended

Recommended

More Related Content

Similar to AI and Data, for Good

Similar to AI and Data, for Good (20)

Recently uploaded

Recently uploaded (20)

AI and Data, for Good