Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

10 ways AI can be used for investigations

1,326 views

Published on

Talk at the Global Investigative Journalism Conference 2019 in Hamburg

Published in: Education
  • 7 Sacred "Sign Posts" From The Universe Revealed. Discover the "secret language" the Universes uses to send us guided messages and watch as your greatest desires manifest before your eyes. Claim your free report. ➤➤ http://t.cn/AiuvUCDd
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hear The Angels Sing: Listen to this free musical composition to clear away all the negativity in your life and welcome in miracles! Download your complimentary "Angel Soundscape" now. ■■■ http://scamcb.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

10 ways AI can be used for investigations

  1. 1. SciCAR 2019 Ways10 AI can be used for investigations @PaulBradshaw Birmingham City University, BBC Data Unit
  2. 2. 1. Bullshit terms 2. I’ve got a BIG HAMMER. Now, where can I get some NAILS? 3. My head explodes.
  3. 3. 1. Bullshit terms 2. I’ve got a BIG HAMMER. Now, where can I get some nails? 3. My head explodes.
  4. 4. “Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  5. 5. More at pinboard.in/u:paulbradshaw/t:robotjournalism
  6. 6. • Simulation/ imitation of intelligence, since 1950s (e.g. bots) • A useless term which means 427 different things Artificial intelligence (AI)? https://hai.stanford.edu/news/infographic-age-artificial-intelligence
  7. 7. https://insights.ap.org/uploads/images/the-future-of-augmented-journalism_ap-report.pdf
  8. 8. • Reasoning (solve problems) • Knowledge (categorisation) • Planning (identifying steps) • Communication (language, e.g. translation) • Perception (identification, e.g. auto cars) What is AI used for?
  9. 9. • Content recommendations (59%) • Commercial optimisation (e.g. ad targeting, dynamic pricing) (39%) • To assist journalists find stories (intelligent agents) (36%) • To automate or semi-automate workflows (40% of respondents) Reuters Institute In news?
  10. 10. Machine learning: subset of AI that allows the computer to optimise the best way to get results
  11. 11. • Training data • Supervised learning (using x and y) • Unsupervised learning (using x) • Reinforcement learning (optimal action) Some ML concepts...
  12. 12. Rothe et al
  13. 13. Assogba, Machine Visions Unsupervised learning
  14. 14. Instagram Engineering (2015)
  15. 15. • Supervised learning: classification (“x”=“A”), regression (y=x*2) • Unsupervised learning: association, Clustering, • Reinforcement learning: games, route finding
  16. 16. 1. Bullshit terms 2. I’ve got a BIG HAMMER. Now, where can I get some NAILS? 3. My head explodes.
  17. 17. • Finding needles in haystacks • Identifying trends (or departures from trends) • Examining an application of AI or co mputation as the subject of the story itself 3 applications of AI for newsrooms Hansen et al 2017
  18. 18. BUT I’VE GOT 10!
  19. 19. 1. Establishing the scale of a problem*
  20. 20. Atlanta Journal-Constitution Training needed!
  21. 21. BuzzFeed
  22. 22. ICIJ
  23. 23. 2. Proving a problem exists
  24. 24. “.” Random Forest algorithm: SRF Data (GitHub repo): https://www.srf.ch/radio-srf-virus/
  25. 25. “.” https://www.latimes.com/local/cityhall/la-me-crime-stats-20151015-story.html
  26. 26. 3. Unmasking a system
  27. 27. https://www.propublica.org/nerds/how-propublicas-message-machine-reverse-engineers-political-microtargeting
  28. 28. NLP: Natural Language Processing Instagram Engineering (2015)
  29. 29. “NLP techniques trained on a corpus of internet writing from the 1990s may reflect stereotypical and dated word associations—the word ‘female’ might be associated with ‘receptionist’” AI Now 2017 report
  30. 30. 4. Document histories: sentiment and other clues
  31. 31. https://www.washingtonpost.com/investigations/whistleblowers-say-usaids-ig-removed-critical-details-from-public-reports/2014/10/22/68fbc1a0-4031-11e4-b03f-de718edeb92f_story.html
  32. 32. 5. Unmasking pseudo-human behaviour
  33. 33. Kao, 2017
  34. 34. 6. Creating a basis for further enquiry
  35. 35. https://www.apnews.com/08659d568d7448a6b500f27d98a6c3a6 https://insights.ap.org/uploads/images/the-future-of-augmented-journalism_ap-report.pdf
  36. 36. https://pudding.cool/2018/07/women-in-parliament/
  37. 37. https://stateofopendata.od4d.net/chapters/issues/artificial-intelligence.html
  38. 38. 7. Jargon and obfuscation
  39. 39. 8. Interesting associations, linking data
  40. 40. https://artificialinformer.com/issue-one/dissecting-a-machine-learning-powered-investigation.html
  41. 41. 9. AI in the sky: satellite imagery, drone footage, sensors — and target acquisition
  42. 42. https://blogs.nvidia.com/blog/2019/04/04/human-rights-watch-ai-gtc/
  43. 43. https://blogs.nvidia.com/blog/2019/04/04/human-rights-watch-ai-gtc/
  44. 44. http://texty.org.ua/d/2018/amber_eng/
  45. 45. https://www.thejakartapost.com/life/2019/08/02/googles-artificial-intelligence-helps-protect-west-sumatras-rainforests.html https://techcrunch.com/2018/03/23/rainforest-connection-enlists-machine-learning-to-listen-for-loggers-and-jaguars-in-the-amazon/
  46. 46. 10. Conversion: OCR, audio, video — and summaries
  47. 47. https://www.wired.com/story/polisis-ai-reads-privacy-policies-so-you-dont-have-to/
  48. 48. https://www.wired.com/story/polisis-ai-reads-privacy-policies-so-you-dont-have-to/
  49. 49. http://jonathanstray.com/extracting-campaign-finance-data-from-gnarly-pdfs-using-deep-learning
  50. 50. NLG: Natural Language Generation Microsoft deletes racist, genocidal tweets from AI chatbot Tay
  51. 51. “.” Tow Center, Guide to Automated Journalism
  52. 52. Tow Center, Computational Campaign Coverage Yellow: raw data inserted into the text Purple: calculations with the raw data. Green: synonyms, used to add variety “exponentially increases the possible variants for the whole text”
  53. 53. “When we teach computers to write, the computers don’t replace us any more than pianos replace pianists—in a certain way, they become our pens, and we become more than writers. We become writers of writers.” Ross Goodwin
  54. 54. • Underestimated effort needed for quality control and troubleshooting • Most common errors due to errors in underlying data “Added complexity often increased the likelihood of new errors”
  55. 55. 11. Entry points: personalisation, A/B headlining, translation and chatbots
  56. 56. 11! HOLD ON! THAT’S
  57. 57. https://github.com/BBC-Data-Unit/tree-planting
  58. 58. https://www.akqa.com/work/lokai/walk-with-yeshi/
  59. 59. https://interactive.aljazeera.com/aje/2016/malaysia-babies-for-sale-101-east/index.html
  60. 60. “The real value is not in reaching more people, but rather in deepening the relationship with the people you reach.” John Keefe https://onlinejournalismblog.com/2018/06/04/gen-summit-ais-breakthrough-year-in-publishing/
  61. 61. 1. Recording: in machine-readable forms 2. Recombining: via automation or algorithm to create multiple versions or narratives 3. Re-use: “the persistence in databases of categorised atoms which can be manipulated for continuing use” Jones and Jones 2019 Structured journalism: 3 characteristics
  62. 62. https://stateofopendata.od4d.net/chapters/issues/artificial-intelligence.html
  63. 63. Microsoft Cognitive Services, Google Cloud AI
  64. 64. 1. Bullshit terms 2. I’ve got a BIG HAMMER. Now, where can I get some nails? 3. My head explodes.
  65. 65. 1. Data availability 2. Amortisation (story is unique) 3. Difficulty 4. Accuracy 5. Cost effectiveness vs manual methods (dataset size) Stray 2019 Stray’s challenges
  66. 66. “We find journalists are “writing for machines” by converting unstructured information into structured data to enable automated recombination and future re-use of content. This impacts editorial control by delegating responsibility to either the algorithm or the audience, in the name of choice.” Jones and Jones 2019 Structured journalism
  67. 67. >15 algorithms: e.g. random forest, Bayesian networks, Support Vector Machines and… Deep Learning (model the brain, not the world) DARPA
  68. 68. NLG: Natural Language Generation https://www.aclweb.org/anthology/W17-1613
  69. 69. “[Algorithms] are not isolated deterministic actors but an inextricable component within a network of communicative practices that includes economic, institutional and increasingly legal and ethical issues” Matt Carlson
  70. 70. Danke. @PaulBradshaw, Birmingham City University Online Journalism Blog, BBC England data unit
  71. 71. BONUS. Object verification
  72. 72. https://www.scientificamerican.com/article/human-traffickers-caught-on-hidden-internet/

×