#keepitsimple, Final project for BIG DIVE 2
A project by: Alex Comunian
The TOP-IX dataset contains all the Consortium streaming platform server logs. The project was divided into two parts: the first aimed at searching trends or mathematical models suitable for the web radio througput; the second aimed at creating a report about the exchange of streaming platform traffic in 2012. The final display is the amount of traffic transferred during the year, highlighting the “points of interest”.
Link to Project: http://keepitsimple.bigdive.eu/
Link to Video Presentation: http://www.youtube.com/watch?v=H4syu6HKndU
Link to BigDive page: http://www.bigdive.eu/final-projects/
2. Who am I
July, 5 2013
Alex Comunian
• Student @Politecnico di Torino
• Junior Developer @Top-Ix Consortium
• Python beginner
mailto: alex.comunian@top-ix.org
Twitter: @ComuAlex
3. July, 5 2013
Dataset
Top-Ix Streaming Dataset
3 Gbytes, from 2011/11 to 2013/06
35 Millions entries
Challange
Try to understand Radio and Video
streaming trends
4. Data Format – Streaming Logs
July, 5 2013
Video:
quartaretetv1;1348582219;1103716226;95.235.157.22
1;1098;104456;0;wowza4;_definst_;fo
rmusicweb
Radio:
RadioflashdG;1348601029;12312567243212349;
123.125.67.243;978;12353;3;stream15;na;na
5. July, 5 2013
Project Goals
• Analyze and study the dataset
• Find correlation between radio streams
• Create a report for Top-Ix streaming
platform
• Find Point of Interests
6. July, 5 2013
Tools & technologies
• Python & MrJob (Map/Reduce) to
analyze data
• Data Science to find pattern
• D3.js to visualize the results
• Amazon Web Services as storage
• Ip geolocation API
7. July, 5 2013
Python 1, Bottom Up approach
15 Different jobs on AWS
• Found for each application the total amount of
Kbytes
• Found the 6 “best” applications
• For the “best” apps found the total amount of
Kbytes for each month
10. July, 5 2013
Python 2, Top Down approach
• Found for each month the total amount of
Kbytes transferred by Video+Radio
• Geolocation for each month every Ip address
(using Web API) More than 1 Million
• Aggregated the result for each countries and
calculated percentage of international users
for each month
11. July, 5 2013
Issues
• To find patterns for radio stream
• To Geolocate 1 Million of Ip addresses,
very long python-multiprocess
12. July, 5 2013
Final consideration
Great experience
First Time with visualization & Data Science
Results:
3 good projects (code on Github very soon)
I Improved my #dev abilities
D3.js Amazing discover
Data Power
13. July, 5 2013
Big Thanks to
Top-Ix for the opportunities
ToDo, Axant & ISI
Amazing teachers
Christian & Laura for the great
support
#Divers, bright &
smart people