Kalev Leetaru is a Senior Fellow at the George Washington University Center for Cyber & Homeland Security and a member of its Counterterrorism and Intelligence Task Force. Leetaru was named one of Foreign Policy Magazine's Top 100 Global Thinkers of 2013, as well as being a 2015-2016 Google Developer Expert for Google Cloud Platform. Leetaru's work focuses on how innovative applications of the world's largest datasets, computing platforms, algorithms and mind-sets can reimagine the way we understand and interact with our global world. The GDELT Project is a realtime open data global graph over human society as seen through the eyes of the world's news media, reaching deeply into local events, reaction, discourse, and emotions of the most remote corners of the world in near-realtime and making all of this available as an open data firehose to enable research over human society.
Leetaru, Kalev: lighntning talk, The GDELT Project
1.
2. Datasets
• NEWS: Worldwide local news coverage in 100 languages (65 live
translated) – online news preserved via Internet Archive
• TELEVISION: Collaboration with the Internet Archive to process
more than 100 television stations across the US, updating daily
• ACADEMIC LITERATURE: 21 billion words covering 70 years
(JSTOR/DTIC/CORE/CITESEER/IA)
• BOOKS: Collaboration with Internet Archive and HathiTrust to
process 3.5 million books 1800-2015
• HUMAN RIGHTS: Half century of worldwide human rights reports
• IMAGERY: Large fraction of global news imagery processed via deep
learning: objects/activities, OCR, logos, facial sentiment, geolocation
3.
4. Preserving Online News
• World’s largest initiative to preserve online news – partnership with the Internet
Archive
• Only program to focus on worldwide local news in local languages
• 1.5-2% of news articles disappear within 2 weeks
• 5% disappear within a month
• Up to 14% gone after 2 months – half with 404 and half ranging from sustained
500’s to domain removal (popular in some areas of the world)
• Of GDELT-relevant coverage, 140,000 articles published today will be gone in 2
months
• 14 million GDELT monitored articles disappeared over a 6 month period
representing 2x the total output of the New York Times over the last half century
• Nepal 2015 Earthquake: preserving coverage of sudden-onset natural disasters
requires “always on” preservation – GDELT preserved 667,000 articles – 225,000
non-English, with top being Nepali
Yet, timelines are just one of the ways we can visualize all of this data. Maps and network diagrams are especially powerful ways of understanding society. In clockwise order: 1) A map by BBVA of refugee inflows and outflows, 2) A map of global wildlife crime, 3) A network of the people most heavily associated with Russian economic sanctions, 4) A geographic network diagram of refugee flows, 5) A map of the countries discussing the Greek economic crisis, 6) A map of global anti-tank and aircraft missile activity
Single map on the bottom combined 860 billion emotional scores, 1.5 billion location mentions, 89 million events and 1.4 million photographs from 200 million news articles in 65 languages from every country on earth covering 2015