SlideShare a Scribd company logo
Text-Mining:
Big Data Analytics voor ongestructureerde data
Prof dr ir Jan C. Scholtes
https://textmining.nu
Prof dr ir Jan C. Scholtes
3
Exploratory Search
4
Text Mining
Text Mining: The next step in
Search Technology
Finding without knowing exactly what
you’re looking for, or finding what
apparently isn’t there (or who do not
want to be found …).
5
5
•Social network analysis
•Community Detection
•Different types of
visualization for
temporal, geographical,
semantic or relational
mappings.
•Anomaly Detection
•Decision Tree
•Bayes Classifiers
•Rochio
•k-NN
•Support Vector Machines
•Clustering
•CNN
•LSTM
•Entity extraction
•Fact, Event & Concept
extraction
•Negations, co-reference
resolution
•Grammars
•Statistical methods: Hidden
Markov Models, Maximum
Entropy Models, Conditional
Random Fields, …
•Data normalization
(Ontology matching)
•Inverted file index
•Relevance ranking
•Relevance feedback
•Faceted search
•Incomplete matching
•Index compression
•Precision & Recall
Search
Information
Extraction
Link Analysis
& Data
Visualization
Machine
Learning
6
Language_Name English
CITY New Brunswick, WASHINGTON
COMPANY J&J, Johnson & Johnson
COUNTRY Greece, Poland, Romania, United Kingdom
CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD
DATE 04-08
DAY Fri, Friday
NOUN_GROUP
biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations,
intense revenue pressure, meaningful credit, medical device kickbacks, medical devices, multiple businesses, next several
days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past
year, paying kickbacks, same time, several new positions, similar violations, travel gifts
ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan
PEOPLES Iraqi
PERSON Erik Gordon, Mythili Raman, William Weldon
PLACE_REGION Europe
PRODUCT Benadryl, Tylenol
PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil
STATE N.J.
TIME 1:32 pm ET
TIME_PERIOD 13 years, five years, six months, three years
YEAR 2007
Problem
"We went to the government to report improper payments and have taken full responsibility for these actions," said
William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where
millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign
Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain
business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in
illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver
the bribes.
Sentiment
giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing
foreign officials, what is honest
Request make sure it complies with anti-bribery laws across its businesses
7
WHAT happened?
8
WHO
8
9
WHAT-WHEN: Topic Rivers
10
WHY & WHO: Emotion Detection
11
Anomaly Detection
Σ(Φ)
12
Text Mining the Lord of the Rings
• Automatic
identification of
key players
(custodians)
• Automatic
identification of
locations.
• Automatic
identification of
travel patterns of
key players.
• Visualize in time.
Memory Consistency
24/7
Speed &
Scalability
Search
M&A and
Restructuring
Data
Collection
Analytics
eDiscovery,
Regulatory
Requests,
Investigations,
Fact-Finding
Missions
Reporting
Archiving
Knowledge
Management
Production
Big Data Analytics and the Law
ZyLAB used as e-
Discovery & e-Disclosure
standard for all United
Nations-backed War Crime
Tribunals and ongoing UN
courts
16SLIDE / 16
• FOIA (WOB)
• Audits &
Internal Investigations
• Litigation
• Arbitration
• Answering Regulatory
Requests
• Subject Access
Requests
• Right to be Forgotten
eDiscovery
17
3x more relevant
documents than
Boolean search
No complex queries, just
review documents
2x total number of
relevant documents
is all that need to be
reviewed
Estimate
accurately percentage of all
relevant documents found at
end
Teach the computer what to look for …
18
CCPA
SLIDE / 19
GDPR & AVG: Aflakken, anonimiseren, …
SLIDE / 20
Hoe werkt dat?
Search Pattern Recognition Text-Mining
Thank you!
Time for Q&A
Prof dr ir Jan C. Scholtes
https://www.linkedin.com/in/jscholtes/
https://textmining.nu

More Related Content

Similar to Text mining scholtes - big data congress utrecht 2019

Ona 2012
Ona 2012Ona 2012
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
RahulTr22
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
PerumalPitchandi
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
ssusereadde9
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptx
shyam1985
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
Arvind Bhisikar
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Connotate
 
Cognitive Legal Science V5
Cognitive Legal Science  V5Cognitive Legal Science  V5
Cognitive Legal Science V5Howard Moskowitz
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Joe Keating
 
Data mining
Data miningData mining
Data mining
Birju Tank
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378nitttin
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021
andygustafson
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Stuart Shulman
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for Discovery
DayOne
 
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Joe Keating
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical Devices
PremNarayanan6
 

Similar to Text mining scholtes - big data congress utrecht 2019 (20)

Ona 2012
Ona 2012Ona 2012
Ona 2012
 
benfords Law
benfords Lawbenfords Law
benfords Law
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptx
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Cognitive Legal Science V5
Cognitive Legal Science  V5Cognitive Legal Science  V5
Cognitive Legal Science V5
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for Discovery
 
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical Devices
 
mineria de datos
mineria de datosmineria de datos
mineria de datos
 
mineria datos
mineria datosmineria datos
mineria datos
 

More from jcscholtes

Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029
jcscholtes
 
LegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote ScholtesLegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote Scholtes
jcscholtes
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
jcscholtes
 
Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101
jcscholtes
 
Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030
jcscholtes
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
jcscholtes
 
Hogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal AnalyticsHogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal Analytics
jcscholtes
 
HvA Legaltech Lab Opening
HvA Legaltech Lab OpeningHvA Legaltech Lab Opening
HvA Legaltech Lab Opening
jcscholtes
 
Big Data en Data Science en de Rechtspraak
Big Data en Data Science en de RechtspraakBig Data en Data Science en de Rechtspraak
Big Data en Data Science en de Rechtspraak
jcscholtes
 
How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?
jcscholtes
 
Big data analytics for legal fact finding
Big data analytics for legal fact findingBig data analytics for legal fact finding
Big data analytics for legal fact finding
jcscholtes
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
jcscholtes
 
Efficiently Handling Subject Access Requests
Efficiently Handling Subject Access RequestsEfficiently Handling Subject Access Requests
Efficiently Handling Subject Access Requests
jcscholtes
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeft
jcscholtes
 

More from jcscholtes (14)

Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029
 
LegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote ScholtesLegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote Scholtes
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
 
Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101
 
Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Hogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal AnalyticsHogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal Analytics
 
HvA Legaltech Lab Opening
HvA Legaltech Lab OpeningHvA Legaltech Lab Opening
HvA Legaltech Lab Opening
 
Big Data en Data Science en de Rechtspraak
Big Data en Data Science en de RechtspraakBig Data en Data Science en de Rechtspraak
Big Data en Data Science en de Rechtspraak
 
How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?
 
Big data analytics for legal fact finding
Big data analytics for legal fact findingBig data analytics for legal fact finding
Big data analytics for legal fact finding
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
 
Efficiently Handling Subject Access Requests
Efficiently Handling Subject Access RequestsEfficiently Handling Subject Access Requests
Efficiently Handling Subject Access Requests
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeft
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

Text mining scholtes - big data congress utrecht 2019

  • 1. Text-Mining: Big Data Analytics voor ongestructureerde data Prof dr ir Jan C. Scholtes https://textmining.nu
  • 2. Prof dr ir Jan C. Scholtes
  • 4. 4 Text Mining Text Mining: The next step in Search Technology Finding without knowing exactly what you’re looking for, or finding what apparently isn’t there (or who do not want to be found …).
  • 5. 5 5 •Social network analysis •Community Detection •Different types of visualization for temporal, geographical, semantic or relational mappings. •Anomaly Detection •Decision Tree •Bayes Classifiers •Rochio •k-NN •Support Vector Machines •Clustering •CNN •LSTM •Entity extraction •Fact, Event & Concept extraction •Negations, co-reference resolution •Grammars •Statistical methods: Hidden Markov Models, Maximum Entropy Models, Conditional Random Fields, … •Data normalization (Ontology matching) •Inverted file index •Relevance ranking •Relevance feedback •Faceted search •Incomplete matching •Index compression •Precision & Recall Search Information Extraction Link Analysis & Data Visualization Machine Learning
  • 6. 6 Language_Name English CITY New Brunswick, WASHINGTON COMPANY J&J, Johnson & Johnson COUNTRY Greece, Poland, Romania, United Kingdom CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD DATE 04-08 DAY Fri, Friday NOUN_GROUP biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations, intense revenue pressure, meaningful credit, medical device kickbacks, medical devices, multiple businesses, next several days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past year, paying kickbacks, same time, several new positions, similar violations, travel gifts ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan PEOPLES Iraqi PERSON Erik Gordon, Mythili Raman, William Weldon PLACE_REGION Europe PRODUCT Benadryl, Tylenol PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil STATE N.J. TIME 1:32 pm ET TIME_PERIOD 13 years, five years, six months, three years YEAR 2007 Problem "We went to the government to report improper payments and have taken full responsibility for these actions," said William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver the bribes. Sentiment giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing foreign officials, what is honest Request make sure it complies with anti-bribery laws across its businesses
  • 10. 10 WHY & WHO: Emotion Detection
  • 12. 12 Text Mining the Lord of the Rings • Automatic identification of key players (custodians) • Automatic identification of locations. • Automatic identification of travel patterns of key players. • Visualize in time.
  • 13.
  • 14. Memory Consistency 24/7 Speed & Scalability Search M&A and Restructuring Data Collection Analytics eDiscovery, Regulatory Requests, Investigations, Fact-Finding Missions Reporting Archiving Knowledge Management Production Big Data Analytics and the Law
  • 15. ZyLAB used as e- Discovery & e-Disclosure standard for all United Nations-backed War Crime Tribunals and ongoing UN courts
  • 16. 16SLIDE / 16 • FOIA (WOB) • Audits & Internal Investigations • Litigation • Arbitration • Answering Regulatory Requests • Subject Access Requests • Right to be Forgotten eDiscovery
  • 17. 17 3x more relevant documents than Boolean search No complex queries, just review documents 2x total number of relevant documents is all that need to be reviewed Estimate accurately percentage of all relevant documents found at end Teach the computer what to look for …
  • 19. SLIDE / 19 GDPR & AVG: Aflakken, anonimiseren, …
  • 20. SLIDE / 20 Hoe werkt dat? Search Pattern Recognition Text-Mining
  • 21. Thank you! Time for Q&A Prof dr ir Jan C. Scholtes https://www.linkedin.com/in/jscholtes/ https://textmining.nu