Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enron Email visulisation at the Big Data InfoVis summer school


Published on

Team Fir
Claire Jones
Daniel Rough
Gavin Hales
Lisa Koeman

Published in: Business, Technology
  • ⇒ ⇐ This service will write as best as they can. So you do not need to waste the time on rewritings.
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Enron Email visulisation at the Big Data InfoVis summer school

  1. 1. Visualisation of the Enron dataset the Enron dataset Claire, Daniel, Gavin and Lisa
  2. 2. Enron: background • American energy, com- modities, and services company • Bankrupt in 2001: accounting fraud • Many executives ended up in prison
  3. 3. Enron: dataset • ~500,000 mails • 158 employees • Social security numbers, bank details, etc. taken out • Only large, public mail dataset available, containing real e-mails
  4. 4. Our concept • Dataset contains profanity • Group mails by threads • Calculate profanity factor for different threads • Plot profanity of conversations on timeline
  5. 5. Initial sketch
  6. 6. Approach • Focus on ‘sent’ folder, to avoid duplicates • Group mails by subject (i.e. identify threads) • Taken out blank subjects • Retrieve basic information: • Sender • Timestamp
  7. 7. Approach • List of swearwords • Count frequency of swearwords per thread
  8. 8. Data analysis • Initially used AWS, too time-consuming for what we needed • Ended up using Python • JSON output
  9. 9. Visualisation • Made use of Highcharts.js • Combined with TagCanvas (HTML5)
  10. 10. Demo