11. Search in news database
Use name variants in combination with keywords (“fraud”, “murder”, …)
Read top N results
Usually sorted using basic information retrieval methods (e.g. TF*IDF)
Assess and report risk
Assess the risk in each article and compile into a long report
Typically 2 hours per screened entity
From hundreds to thousands of people in a big bank. Any improvement can save millions.
12. Process all potentially relevant articles
Human reading capabilities are only limited
Highlight important parts of articles
Increases the processing speed and helps to focus on the important parts
Reduce number of irrelevant articles
Analyst reads only what's important
Avoid missing any relevant articles
Missing a relevant article can be very expensive
13.
14. Find articles with the name
Find keywords in the articles
Score occurrence and
position of keywords
Aggregate the scores
It's surprisingly good
15.
16.
17.
18. Speed matters
At most 3 minutes per screened entity, approx. 1 sec per article
Configurability and extensibility
Each client (bank) has a different view on what is relevant
Explainability
Regulators don't like black boxes
ML
Sentence
Probability
of risk related
to screened
entity
19. Fatmir Limaj was arrested 1 (risk)
Harry Newman has suffered a cardiac arrest 0 (no risk)
Judge John Doe sentenced Amy Jones 1 (risk)
Judge Amy Jones sentenced John Doe 0 (no risk)
… …
21. Judge John sentenced Amy Jones ... for bribery.
LSTM
Sentence
Probability
of risk related
to screened
entity
0.75
Judge John sentenced ...
. . .
Example of input sentence:
Model with prediction:
28. Semi-supervised sequence learning
A B C
Meaning of the sentence
represented as a vector of
numbers (e.g. 128 decimals)
meaning <start>
A B C <end>
29. Semi-supervised sequence learning
A B C
LSTM
Sentence
Copy of the
input
sentence
Meaning of the sentence
represented as a vector of
numbers (e.g. 128 decimals)
<start>
A B C <end>
meaning
32. TEXT TO NUMBERS
CONVERTOR
Sentence-level classification
A B C
Fatmir Limaj was arrested 1
Harry Newman has suffered a cardiac arrest 0
Judge John Doe sentenced Amy Jones 1
Judge Amy Jones sentenced John Doe 0
Fine tune the output on our dataset
meaning
33. TEXT TO NUMBERS
CONVERTOR
Sentence-level classification
A B C
0/1
Fatmir Limaj was arrested 1
Harry Newman has suffered a cardiac arrest 0
Judge John Doe sentenced Amy Jones 1
Judge Amy Jones sentenced John Doe 0
Fine tune the output on our dataset
meaning
38. Entity appears low risk
But becomes medium risk as we identify
risky connections via negative media
Entity becomes high risk as we uncover
transactions with risky counterparties
Model analyzes
news articles
Further analysis of transaction relationships