Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
last.fmcontentdashboardsBhushan Deo | Gaurav Bhardwaj | Prabhu Siddharth Raveendran | Krishna Chaitanya Dwarapudi
CmpE 273...
Project Demo
• What is last.fm?
• Last.fm data sets:
1000 users data set: listening history for each user
360,000 users da...
Basic Architecture
Web Services
• Built using Python and BottleWeb Framework
• MySQL connector for Python
• Endpoints:
/1K Track Analysis
/3...
Database Design
Schema:
main_1k : 19150868 tuples
profile_1k : 992 tuples
main_360k : 17559530 tuples
profile_360k : 35934...
Database Design
• Started off in the wrong direction with MongoDB.
main_360k : 17559530 tuples
profile_360k : 359347 tuple...
Database Design
MongoDB; 300
Query Performance
Time(in seconds)
Database Design
• MySQL:With proper indexes on the fields which we are
querying, query time comes down to 30 seconds.
Mong...
Database Design
• MySQL:To improve performance further, we used
partitioning. (based on KEY – MySQL’s own internal
hashing...
Database Design
• MySQL:With partitioning, query time comes down to 7-8
seconds in the worst case.
MongoDB; 300
MySQL (wit...
User Similarity
• Music compatibility between two users based on their
common artists, and number of plays for those artis...
Entire Network Analysis
• Using Hadoop to come up with trends from the entire
network:
-Top 10 artists in the network
-Top...
Entire Network Analysis
• Used Amazon Elastic MapReduce(EMR) minimize the job
time.
• Can be used to easily change the num...
Front End
• Website hosted on Heroku platform.
• Foundation front-end framework, JQuery for UI.
• Canvas JS and HERE Maps ...
Upcoming SlideShare
Loading in …5
×

lastfm contentdashboards project description

597 views

Published on

Data analysis project

Published in: Data & Analytics, Technology
  • Be the first to comment

  • Be the first to like this

lastfm contentdashboards project description

  1. 1. last.fmcontentdashboardsBhushan Deo | Gaurav Bhardwaj | Prabhu Siddharth Raveendran | Krishna Chaitanya Dwarapudi CmpE 273 Project
  2. 2. Project Demo • What is last.fm? • Last.fm data sets: 1000 users data set: listening history for each user 360,000 users data set: top 50 artists for each user Source: http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/ • Business case: to create a web application which queries a data set and creates a visually pleasing dashboard. • http://mysterious-wave-7118.herokuapp.com/ • Code at https://github.com/cmpe273project/lastfmcontentdashboards
  3. 3. Basic Architecture
  4. 4. Web Services • Built using Python and BottleWeb Framework • MySQL connector for Python • Endpoints: /1K Track Analysis /360K  Album Analysis /usersimilarity  User Similarity /networkanalysis  Network Analysis
  5. 5. Database Design Schema: main_1k : 19150868 tuples profile_1k : 992 tuples main_360k : 17559530 tuples profile_360k : 359347 tuples userid | timestamp | artid | artname | traid | traname userid | gender | age | country | registered usersha1 | artmbid | artname | plays usersha1 | gender | age | country | registered
  6. 6. Database Design • Started off in the wrong direction with MongoDB. main_360k : 17559530 tuples profile_360k : 359347 tuples • Thus, due to structure of the data set, we needed joins, but in MongoDB we have to do “joins” programmatically which is inefficient compared to RDBMS. • Ex: If for an artist “the beatles”, we get 40,000 users from the first table, we have to make 40,000 queries to the other table to get the gender, age and country. usersha1 | artmbid | artname | plays usersha1 | gender | age | country | registered
  7. 7. Database Design MongoDB; 300 Query Performance Time(in seconds)
  8. 8. Database Design • MySQL:With proper indexes on the fields which we are querying, query time comes down to 30 seconds. MongoDB; 300 MySQL (with indexes); 40 Query Performance Time(in seconds)
  9. 9. Database Design • MySQL:To improve performance further, we used partitioning. (based on KEY – MySQL’s own internal hashing function.
  10. 10. Database Design • MySQL:With partitioning, query time comes down to 7-8 seconds in the worst case. MongoDB; 300 MySQL (with indexes); 40 MySQL (with partitioning); 8 Query Performance Time(in seconds)
  11. 11. User Similarity • Music compatibility between two users based on their common artists, and number of plays for those artists • Cosine Similarity to calculate the music compatibility between users usersha1 | artmbid | artname | plays
  12. 12. Entire Network Analysis • Using Hadoop to come up with trends from the entire network: -Top 10 artists in the network -Top 10 artists by year - Count for usage (scrobbling by users), by hour, day and month.
  13. 13. Entire Network Analysis • Used Amazon Elastic MapReduce(EMR) minimize the job time. • Can be used to easily change the number of nodes as per requirement. Single Node Hadoop; 16 Amazon EMR (5 m1.small nodes); 4 Amazon EMR (10 m1.medium nodes); 2Amazon EMR(10 m1.large nodes); 1 Time(in minutes) Time(in minutes)
  14. 14. Front End • Website hosted on Heroku platform. • Foundation front-end framework, JQuery for UI. • Canvas JS and HERE Maps API for visualizing data.

×