3. • Billboard was easy – python wrapper for the
API. 10 years, every 4 weeks.
• 7 different charts-
• 'hot-100', 'r-b-hip-hop-songs', 'rock-songs',
'dance-electronic-songs', 'country-songs',
'heatseekers-songs','adult-contemporary’
• Lyrics were challenging– totally different
collection process
• I now have a (relatively) complete database of
lyrics to 2500 popular songs for the last 10
years
The Data Wrangle
4. Processing
• Vocabulary of 13000 unique words
• Bag of words for each song- 2500 x 13k
sparse array, integer counts
• Each song has monthly entries on the
chart; 2500 songs x 130, binary
• Link the words to the charts WT. C
5. Value
• Some insight
into features
of most
successful
songs
• Song memes
reflect
popular
consciousness
http://localhost:5000/input