Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NASA Webserver Big Data InfoVis Summer School presentation


Published on

Team – Ginkgo
Jie Yang
Muhammad Adnan
Luisa Pinto
and Matti Pouke

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

NASA Webserver Big Data InfoVis Summer School presentation

  1. 1. Team Ginkgo NAVI NASA WEB SERVER LOG VISUALIZER Luísa Pinto (University of Glasgow) Jie Yang (Dublin City University) Matti Pouke (University of Oulu)
  2. 2. The Dataset ● NASA web server logs (July and August 1995) ● Around 370 Mb of data ● 3,461,613 lines ● Line format (supposedly!): Requester, Time, URL, HTML Response code, Size ● Possible things to look at: Popular content (images, videos ...), page topics (from urls), user's origins, individual user tracks, attacks.
  3. 3. Design Goals ● A system for explorative visualization – Daily statistics from two months ● The statistics include: Number of requests Content type (Images, videos, text, html, denied, not found) Frequent requests (words in the url) Frequent requester (domain) ● Visualizations are in color (value), bar charts, icons and text clouds
  4. 4. Visualization UI Draft
  5. 5. Current Architecture Overview Data Data Processing (Java) Local storage GUI (Python)
  6. 6. Extended Architecture Data Data Processing Parser + Indexing + Stats (Java) Local storage (or query) GUI (Python) Data Data Data Map/Reduce
  7. 7. Insights ● Traffic is heavier at the beginning of July and declines towards and throughout August ● The NASA server was broken down at the end of July as well as the 2nd of August ● There were more content types than we expected (such as applications, queries and maps) ● In August there seems to be lot of webpage errors in relation to acquired content
  8. 8. What did we learn? ● A quick overview of data contents is important – cheat sheet ● At some point discarding data is almost unavoidable – non-recognisable characters, data format exceptions ● Data can be visualised without the need of an x-y axis – by mapping data to meaning ● Data visualisation techniques allow the user to discover answers without making questions.
  9. 9. Thank you ● Questions or comments?
  10. 10. Thank you ● Questions or comments?