Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visualizing Relationships: Journalistic Problems in a Digital Age

944 views

Published on

A presentation from Marcos Vanetta, Technical Lead and web developer at 3Pillar Global, and Mariano Blejman of Spanish-language newspaper Pagina 12 that was given at the 2012 Mozilla Festival in London, England.

Published in: Devices & Hardware
  • Be the first to comment

  • Be the first to like this

Visualizing Relationships: Journalistic Problems in a Digital Age

  1. 1. Visualizing Relationships: Journalistic Problems in a Digital Age
  2. 2. Summary 1. Introduction 2. The Problem we are solving 3. Involved issues 4. Problems we found 5. The Challenge © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 2
  3. 3. WHO ARE WE? • Mariano Blejman is a technology editor and youth editor in Argentine newspaper Página/12, and Hacks/Hackers Buenos Aires co-founder. @blejmanevel • Marcos Vanetta is a biomedical engineer. Software developer at 3PillarGlobal and hacker at Hacks/Hackers Buenos Aires. @malev © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 3
  4. 4. HACKS/HACKERS BUENO AIRES © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 4
  5. 5. THE PROBLEM • 1976 A dictatorship started in Argentina. • 30,000 persons were kidnapped and disappeared. • 1985 First trials happened in Argentina. They judged the bad guys but we have to stop. • 2003 Justice start judging the bad guys again. • 2012 Large amount of judicial documents. No one can read all of them © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 5
  6. 6. INVOLVED ISSUES • Semantic Analytics • Ontology • Data Mining • Social Network Analysis • Visualizations Who were dealing with documents? DocumentCloud, Overview, Open Calais, NLTK, Gate © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 6
  7. 7. FIRST APPROACH Read all the documents Software solution based on regular expressions Ruby, Padrino and MySQL database. def self.extract_plain_text(path) basename = File.basename(path).split('.')[0..-2].join('.') tmp_dir = Dir.tmpdir Docsplit.extract_text(path, :output => tmp_dir, :ocr => false) text = File.open(File.join(tmp_dir, "#{basename}.txt")).read self.clean_text(text) end © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 7
  8. 8. THE PROBLEMS WE FOUND • Convert text from pdf files • Extract entities from documents • Parse dates and addresses • Co-reference names resolution • How to store relations • Documents contextual information • Confidence on data on a crowdsourcing platform. Visualizing Relationships over the Time © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 8
  9. 9. WHAT DO WE HAVE NOW? Prototype for a single (and local) use case: mapa76 Platform for different use cases: analice.me © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 9
  10. 10. THE VISUALIZATIONS THAT WE IMAGINED © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 10
  11. 11. © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 11
  12. 12. THE VISUALIZATIONS THAT WE FOUND © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 12
  13. 13. © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 13
  14. 14. THE #MOZFEST CHALLENGE Find a big journalistic issue that involves: • Lot of documents with unstructured data • Lot of data to find inside • What relationships do you wants to find © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 14

×