Your SlideShare is downloading. ×
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
lastfm contentdashboards project description
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

lastfm contentdashboards project description

309

Published on

Data analysis project

Data analysis project

Published in: Data & Analytics, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
309
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. last.fmcontentdashboardsBhushan Deo | Gaurav Bhardwaj | Prabhu Siddharth Raveendran | Krishna Chaitanya Dwarapudi CmpE 273 Project
  • 2. Project Demo • What is last.fm? • Last.fm data sets: 1000 users data set: listening history for each user 360,000 users data set: top 50 artists for each user Source: http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/ • Business case: to create a web application which queries a data set and creates a visually pleasing dashboard. • http://mysterious-wave-7118.herokuapp.com/ • Code at https://github.com/cmpe273project/lastfmcontentdashboards
  • 3. Basic Architecture
  • 4. Web Services • Built using Python and BottleWeb Framework • MySQL connector for Python • Endpoints: /1K Track Analysis /360K  Album Analysis /usersimilarity  User Similarity /networkanalysis  Network Analysis
  • 5. Database Design Schema: main_1k : 19150868 tuples profile_1k : 992 tuples main_360k : 17559530 tuples profile_360k : 359347 tuples userid | timestamp | artid | artname | traid | traname userid | gender | age | country | registered usersha1 | artmbid | artname | plays usersha1 | gender | age | country | registered
  • 6. Database Design • Started off in the wrong direction with MongoDB. main_360k : 17559530 tuples profile_360k : 359347 tuples • Thus, due to structure of the data set, we needed joins, but in MongoDB we have to do “joins” programmatically which is inefficient compared to RDBMS. • Ex: If for an artist “the beatles”, we get 40,000 users from the first table, we have to make 40,000 queries to the other table to get the gender, age and country. usersha1 | artmbid | artname | plays usersha1 | gender | age | country | registered
  • 7. Database Design MongoDB; 300 Query Performance Time(in seconds)
  • 8. Database Design • MySQL:With proper indexes on the fields which we are querying, query time comes down to 30 seconds. MongoDB; 300 MySQL (with indexes); 40 Query Performance Time(in seconds)
  • 9. Database Design • MySQL:To improve performance further, we used partitioning. (based on KEY – MySQL’s own internal hashing function.
  • 10. Database Design • MySQL:With partitioning, query time comes down to 7-8 seconds in the worst case. MongoDB; 300 MySQL (with indexes); 40 MySQL (with partitioning); 8 Query Performance Time(in seconds)
  • 11. User Similarity • Music compatibility between two users based on their common artists, and number of plays for those artists • Cosine Similarity to calculate the music compatibility between users usersha1 | artmbid | artname | plays
  • 12. Entire Network Analysis • Using Hadoop to come up with trends from the entire network: -Top 10 artists in the network -Top 10 artists by year - Count for usage (scrobbling by users), by hour, day and month.
  • 13. Entire Network Analysis • Used Amazon Elastic MapReduce(EMR) minimize the job time. • Can be used to easily change the number of nodes as per requirement. Single Node Hadoop; 16 Amazon EMR (5 m1.small nodes); 4 Amazon EMR (10 m1.medium nodes); 2Amazon EMR(10 m1.large nodes); 1 Time(in minutes) Time(in minutes)
  • 14. Front End • Website hosted on Heroku platform. • Foundation front-end framework, JQuery for UI. • Canvas JS and HERE Maps API for visualizing data.

×