Successfully reported this slideshow.
Your SlideShare is downloading. ×

Text mining scholtes - big data congress utrecht 2019

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Data mining
Data mining
Loading in …3
×

Check these out next

1 of 21 Ad

Text mining scholtes - big data congress utrecht 2019

Download to read offline

Wednesday September 18, for the second year in a row, I presented at the Big data Expo (#BigDataExpoNL) in Utrecht, the Netherlands on Text-Mining and how it can be used for big-data analytics on unstructured data, in particular for legal fact-finding missions and GDPR/AVG compliance use cases. Large crowds! Very successful event, good to see that big-data is such a hot topic these days!

Wednesday September 18, for the second year in a row, I presented at the Big data Expo (#BigDataExpoNL) in Utrecht, the Netherlands on Text-Mining and how it can be used for big-data analytics on unstructured data, in particular for legal fact-finding missions and GDPR/AVG compliance use cases. Large crowds! Very successful event, good to see that big-data is such a hot topic these days!

Advertisement
Advertisement

More Related Content

Similar to Text mining scholtes - big data congress utrecht 2019 (20)

More from jcscholtes (14)

Advertisement

Recently uploaded (20)

Text mining scholtes - big data congress utrecht 2019

  1. 1. Text-Mining: Big Data Analytics voor ongestructureerde data Prof dr ir Jan C. Scholtes https://textmining.nu
  2. 2. Prof dr ir Jan C. Scholtes
  3. 3. 3 Exploratory Search
  4. 4. 4 Text Mining Text Mining: The next step in Search Technology Finding without knowing exactly what you’re looking for, or finding what apparently isn’t there (or who do not want to be found …).
  5. 5. 5 5 •Social network analysis •Community Detection •Different types of visualization for temporal, geographical, semantic or relational mappings. •Anomaly Detection •Decision Tree •Bayes Classifiers •Rochio •k-NN •Support Vector Machines •Clustering •CNN •LSTM •Entity extraction •Fact, Event & Concept extraction •Negations, co-reference resolution •Grammars •Statistical methods: Hidden Markov Models, Maximum Entropy Models, Conditional Random Fields, … •Data normalization (Ontology matching) •Inverted file index •Relevance ranking •Relevance feedback •Faceted search •Incomplete matching •Index compression •Precision & Recall Search Information Extraction Link Analysis & Data Visualization Machine Learning
  6. 6. 6 Language_Name English CITY New Brunswick, WASHINGTON COMPANY J&J, Johnson & Johnson COUNTRY Greece, Poland, Romania, United Kingdom CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD DATE 04-08 DAY Fri, Friday NOUN_GROUP biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations, intense revenue pressure, meaningful credit, medical device kickbacks, medical devices, multiple businesses, next several days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past year, paying kickbacks, same time, several new positions, similar violations, travel gifts ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan PEOPLES Iraqi PERSON Erik Gordon, Mythili Raman, William Weldon PLACE_REGION Europe PRODUCT Benadryl, Tylenol PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil STATE N.J. TIME 1:32 pm ET TIME_PERIOD 13 years, five years, six months, three years YEAR 2007 Problem "We went to the government to report improper payments and have taken full responsibility for these actions," said William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver the bribes. Sentiment giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing foreign officials, what is honest Request make sure it complies with anti-bribery laws across its businesses
  7. 7. 7 WHAT happened?
  8. 8. 8 WHO 8
  9. 9. 9 WHAT-WHEN: Topic Rivers
  10. 10. 10 WHY & WHO: Emotion Detection
  11. 11. 11 Anomaly Detection Σ(Φ)
  12. 12. 12 Text Mining the Lord of the Rings • Automatic identification of key players (custodians) • Automatic identification of locations. • Automatic identification of travel patterns of key players. • Visualize in time.
  13. 13. Memory Consistency 24/7 Speed & Scalability Search M&A and Restructuring Data Collection Analytics eDiscovery, Regulatory Requests, Investigations, Fact-Finding Missions Reporting Archiving Knowledge Management Production Big Data Analytics and the Law
  14. 14. ZyLAB used as e- Discovery & e-Disclosure standard for all United Nations-backed War Crime Tribunals and ongoing UN courts
  15. 15. 16SLIDE / 16 • FOIA (WOB) • Audits & Internal Investigations • Litigation • Arbitration • Answering Regulatory Requests • Subject Access Requests • Right to be Forgotten eDiscovery
  16. 16. 17 3x more relevant documents than Boolean search No complex queries, just review documents 2x total number of relevant documents is all that need to be reviewed Estimate accurately percentage of all relevant documents found at end Teach the computer what to look for …
  17. 17. 18 CCPA
  18. 18. SLIDE / 19 GDPR & AVG: Aflakken, anonimiseren, …
  19. 19. SLIDE / 20 Hoe werkt dat? Search Pattern Recognition Text-Mining
  20. 20. Thank you! Time for Q&A Prof dr ir Jan C. Scholtes https://www.linkedin.com/in/jscholtes/ https://textmining.nu

×