Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visual Analysis and Historical Discovery

845 views

Published on

Visual Analysis and Historical Discovery
Percy Dante, Chandan Kumar, Victoria Hore and Julia Juergens

  • These are one of the best companies for review articles. High quality with cheap rates. ⇒⇒⇒WRITE-MY-PAPER.net ⇐⇐⇐ I highly recommend it :)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Visual Analysis and Historical Discovery

  1. 1. VISUALANALYSIS AND HISTORICAL DISCOVERY Summer School on Big Data Information Visulisation Chandan Kumar (University of Oldenburg) Julia Juergens (University of Hildesheim) Percy Perez (University of St. Andrews) Victoria Hore (University of Oxford) BRIGHTSOLID: NEWSPAPER DATASET
  2. 2. Data description • Newspapers • Fife Herald 1833-1878 • The Dundee Courier & Argus 1890-1899 • Data set • 154 GB of XML files • 16 048 issues (1 METs file for 1 issue) • 77 954 pages (1 ALTO file for 1 page) • no images
  3. 3. Data files TitleMET - OCR errors - No meaning ALTO
  4. 4. Methodology
  5. 5. Architectural overview
  6. 6. Data processing • 20 years data analyzed • 12 years have complete titles • 8 years do not have complete titles • 6189 files analysed • 314 meta files per year ( Avg) • 12 years => 3754 issues • Word counting, formating files to/from XML, D3 and Jigsaw • Hadoop processing was impressive
  7. 7. Idea generation • What happened in the 19th century? • Find interesting stories • Where were events happening? • Overview of mentioned locations • What were the most common topics? • Overview of frequent words • Categorization of words • Who was mentioned? • Entity recognition of names
  8. 8. Visualization (overview)
  9. 9. Visualization (overview)
  10. 10. Visual Exploration with Jigsaw • Jigsaw already has good functions and visualizations!
  11. 11. Visualisations (Beyond Jigsaw) • More numerical analysis • User selected dimensions and exploration • Dynamic visualization • topics, locations, entities • Pattern analysis
  12. 12. Interactive visualisation
  13. 13. Dynamic exploration
  14. 14. Insights • Industrial revolution in Dundee • Frequency analysis, cluster overview, positive sentiments • LATEST MOVEMENTS OF DUNDEE JUTE FLEET • Entity relations, bigram analysis • Calcutta, Indian subcontinent? • Location-commercial significance • Baxter Brothers was the world's largest linen manufacturer (1840-1890) • Family names-organization
  15. 15. Conclusions • A really steep learning curve • Big data is BIG • Distributed computing is important • Data wants to tell interesting stories (we just need to interact) • Visualisation is powerful • Jigsaw is awesome • Lot of useful visualisation tools are ready to be used • Generalizations and Interactions (future work)
  16. 16. THANK YOU FOR THE COOL (SCHOOL) EXPERIENCE  Big thanks to BRIGHTSOLID for providing the interesting dataset Chandan Kumar Julia Juergens Percy Perez Victoria Hore

×