Visualizing Relationships: Journalistic Problems in a Digital Age

745 views
649 views

Published on

A presentation from Marcos Vanetta, Technical Lead and web developer at 3Pillar Global, and Mariano Blejman of Spanish-language newspaper Pagina 12 that was given at the 2012 Mozilla Festival in London, England.

Published in: Devices & Hardware
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
745
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 3Pillar Global has brought together the expertise of engineering and the critical understanding of the market and business needs to build innovative software products that propels clients’ businesses forward.
  • Visualizing Relationships: Journalistic Problems in a Digital Age

    1. 1. Visualizing Relationships: Journalistic Problems in a Digital Age
    2. 2. Summary 1. Introduction 2. The Problem we are solving 3. Involved issues 4. Problems we found 5. The Challenge © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 2
    3. 3. WHO ARE WE? • Mariano Blejman is a technology editor and youth editor in Argentine newspaper Página/12, and Hacks/Hackers Buenos Aires co-founder. @blejmanevel • Marcos Vanetta is a biomedical engineer. Software developer at 3PillarGlobal and hacker at Hacks/Hackers Buenos Aires. @malev © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 3
    4. 4. HACKS/HACKERS BUENO AIRES © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 4
    5. 5. THE PROBLEM • 1976 A dictatorship started in Argentina. • 30,000 persons were kidnapped and disappeared. • 1985 First trials happened in Argentina. They judged the bad guys but we have to stop. • 2003 Justice start judging the bad guys again. • 2012 Large amount of judicial documents. No one can read all of them © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 5
    6. 6. INVOLVED ISSUES • Semantic Analytics • Ontology • Data Mining • Social Network Analysis • Visualizations Who were dealing with documents? DocumentCloud, Overview, Open Calais, NLTK, Gate © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 6
    7. 7. FIRST APPROACH Read all the documents Software solution based on regular expressions Ruby, Padrino and MySQL database. def self.extract_plain_text(path) basename = File.basename(path).split('.')[0..-2].join('.') tmp_dir = Dir.tmpdir Docsplit.extract_text(path, :output => tmp_dir, :ocr => false) text = File.open(File.join(tmp_dir, "#{basename}.txt")).read self.clean_text(text) end © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 7
    8. 8. THE PROBLEMS WE FOUND • Convert text from pdf files • Extract entities from documents • Parse dates and addresses • Co-reference names resolution • How to store relations • Documents contextual information • Confidence on data on a crowdsourcing platform. Visualizing Relationships over the Time © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 8
    9. 9. WHAT DO WE HAVE NOW? Prototype for a single (and local) use case: mapa76 Platform for different use cases: analice.me © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 9
    10. 10. THE VISUALIZATIONS THAT WE IMAGINED © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 10
    11. 11. © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 11
    12. 12. THE VISUALIZATIONS THAT WE FOUND © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 12
    13. 13. © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 13
    14. 14. THE #MOZFEST CHALLENGE Find a big journalistic issue that involves: • Lot of documents with unstructured data • Lot of data to find inside • What relationships do you wants to find © Copyright 2014. 3Pillar | All rights reserved Strictly Confidential 14

    ×