This document describes a Spotify playlist recommender system challenge. The goal is to develop a system that can automatically recommend new tracks to add to a playlist based on its existing tracks. Several proposed solutions are described including collaborative filtering, K-nearest neighbors (KNN), frequent pattern growth, and matrix factorization. Evaluation metrics like R-precision and NDCG are defined. Exploratory data analysis is performed on the Million Playlist Dataset and various solutions are tested, with playlist-based and song-based KNN performing the best in terms of metrics while being fast.
발표자: 김영삼(서울대 박사)
발표일: 2018.8.
2015년 아타리 게임 컨트롤 과제와 2016년 알파고의 세계바둑 제패와 함께 강화학습은 많은 기계학습 연구자들의 관심을 얻게 되었으나, 자연어 처리 분야에서 강화학습은 아직까지 그 뚜렷한 활용전략이 나타나지 않고 있다. 본 토크에서는 강화학습이 자연어 처리 문제에 적용하기 어려운 주요 배경 중의 하나로, '보상의 희소성' 문제를 지적하고, 이를 해결하기 위한 방법으로 모형기반 강화학습과 기억기반 접근법의 활용 가능성을 논의하고자 한다. 이 가능성을 예시하기 위해 temporal-difference learning을 이용한 단어의 감정값 측정과 의료 진술문 상태값 측정과제를 수행하였고, 이를 중심으로 그 활용방법과 의미를 논의한다.
This document describes the development of a sentiment analysis engine for classifying texts as positive or negative sentiment. It involves several steps: data preparation through cleaning, tagging parts of speech, and vectorization; training classification models including logistic regression, random forest, and extra trees; and updating the database with sentiment labels. Evaluation shows the classification engine achieves higher accuracy than the existing lexicon-based approach, particularly for positive texts, though accuracy drops slightly on some dates with an imbalance of negative texts. Overall, the classification approach improved the sentiment analysis accuracy for the target use case.
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...Jigsaw Academy
Rajesh Peruri analyzed a dataset containing client feedback comments and recommended scores (RECOM) to predict RECOM using sentiment analysis of the comments. The analysis included: determining sentiment scores of comments using association matrices and clustering; and building a linear regression model relating RECOM to other variables and sentiment. Rajanikar performed sentiment analysis of comments by assigning sentiment scores to words based on a dictionary and identifying sentiment polarity. Priyadarshini used a sentiment algorithm to assign integer sentiment scores to comments and analyzed the distribution and relationship of sentiment scores and average ratings scores.
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
This document compares sentiment analysis techniques using deep learning and machine learning. It summarizes previous work using various machine learning algorithms and deep learning methods for sentiment analysis. The document then outlines the approach taken in this study, which is to determine the best sentiment analysis results using either machine learning or deep learning techniques. It describes preprocessing the Rotten Tomatoes movie review dataset and creating text matrices before selecting models for classification. The goal is to get a generalized understanding of how sentiment analysis can be performed and which practices yield optimal results.
This document describes a cross-lingual song discovery platform called LyricsMatch that recommends similar Hindi songs based on the lyrics of an input English song. It discusses data collection of English and Hindi song lyrics, various topic modeling techniques tried including LDA, NMF and SVD, and modeling for cross-lingual similarity search using libraries like LASER, FAISS and MUSE. Topic modeling techniques did not yield satisfactory topics. The platform implements cross-lingual similarity search using sentence embeddings from multilingual models to find semantically similar songs across languages.
POS tagging using Resourch Rich Languagesuman101112
This document describes a project on part-of-speech (POS) tagging of the Marathi language using POS tagged data from the Hindi language. The approach uses parallel corpora of 50,000 Hindi-Marathi sentences to project POS tags from Hindi to Marathi via word alignments. Trigram similarities between languages are calculated using pointwise mutual information scores. Tags are assigned to Marathi words based on aligned Hindi words and propagation from neighboring words. An accuracy of 70.6% is achieved on a test set of 100 Marathi sentences.
The document discusses building a machine learning model for resume classification using natural language processing techniques. It explores the data, performs text preprocessing, handles imbalanced classes through oversampling, trains various models using different vectorizers, and achieves 100% accuracy on the test set using a random forest classifier. The top performing random forest model is then deployed for resume classification.
This document describes a Spotify playlist recommender system challenge. The goal is to develop a system that can automatically recommend new tracks to add to a playlist based on its existing tracks. Several proposed solutions are described including collaborative filtering, K-nearest neighbors (KNN), frequent pattern growth, and matrix factorization. Evaluation metrics like R-precision and NDCG are defined. Exploratory data analysis is performed on the Million Playlist Dataset and various solutions are tested, with playlist-based and song-based KNN performing the best in terms of metrics while being fast.
발표자: 김영삼(서울대 박사)
발표일: 2018.8.
2015년 아타리 게임 컨트롤 과제와 2016년 알파고의 세계바둑 제패와 함께 강화학습은 많은 기계학습 연구자들의 관심을 얻게 되었으나, 자연어 처리 분야에서 강화학습은 아직까지 그 뚜렷한 활용전략이 나타나지 않고 있다. 본 토크에서는 강화학습이 자연어 처리 문제에 적용하기 어려운 주요 배경 중의 하나로, '보상의 희소성' 문제를 지적하고, 이를 해결하기 위한 방법으로 모형기반 강화학습과 기억기반 접근법의 활용 가능성을 논의하고자 한다. 이 가능성을 예시하기 위해 temporal-difference learning을 이용한 단어의 감정값 측정과 의료 진술문 상태값 측정과제를 수행하였고, 이를 중심으로 그 활용방법과 의미를 논의한다.
This document describes the development of a sentiment analysis engine for classifying texts as positive or negative sentiment. It involves several steps: data preparation through cleaning, tagging parts of speech, and vectorization; training classification models including logistic regression, random forest, and extra trees; and updating the database with sentiment labels. Evaluation shows the classification engine achieves higher accuracy than the existing lexicon-based approach, particularly for positive texts, though accuracy drops slightly on some dates with an imbalance of negative texts. Overall, the classification approach improved the sentiment analysis accuracy for the target use case.
Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis ...Jigsaw Academy
Rajesh Peruri analyzed a dataset containing client feedback comments and recommended scores (RECOM) to predict RECOM using sentiment analysis of the comments. The analysis included: determining sentiment scores of comments using association matrices and clustering; and building a linear regression model relating RECOM to other variables and sentiment. Rajanikar performed sentiment analysis of comments by assigning sentiment scores to words based on a dictionary and identifying sentiment polarity. Priyadarshini used a sentiment algorithm to assign integer sentiment scores to comments and analyzed the distribution and relationship of sentiment scores and average ratings scores.
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
This document compares sentiment analysis techniques using deep learning and machine learning. It summarizes previous work using various machine learning algorithms and deep learning methods for sentiment analysis. The document then outlines the approach taken in this study, which is to determine the best sentiment analysis results using either machine learning or deep learning techniques. It describes preprocessing the Rotten Tomatoes movie review dataset and creating text matrices before selecting models for classification. The goal is to get a generalized understanding of how sentiment analysis can be performed and which practices yield optimal results.
This document describes a cross-lingual song discovery platform called LyricsMatch that recommends similar Hindi songs based on the lyrics of an input English song. It discusses data collection of English and Hindi song lyrics, various topic modeling techniques tried including LDA, NMF and SVD, and modeling for cross-lingual similarity search using libraries like LASER, FAISS and MUSE. Topic modeling techniques did not yield satisfactory topics. The platform implements cross-lingual similarity search using sentence embeddings from multilingual models to find semantically similar songs across languages.
POS tagging using Resourch Rich Languagesuman101112
This document describes a project on part-of-speech (POS) tagging of the Marathi language using POS tagged data from the Hindi language. The approach uses parallel corpora of 50,000 Hindi-Marathi sentences to project POS tags from Hindi to Marathi via word alignments. Trigram similarities between languages are calculated using pointwise mutual information scores. Tags are assigned to Marathi words based on aligned Hindi words and propagation from neighboring words. An accuracy of 70.6% is achieved on a test set of 100 Marathi sentences.
The document discusses building a machine learning model for resume classification using natural language processing techniques. It explores the data, performs text preprocessing, handles imbalanced classes through oversampling, trains various models using different vectorizers, and achieves 100% accuracy on the test set using a random forest classifier. The top performing random forest model is then deployed for resume classification.
Classification of Machine Translation Outputs Using NB Classifier and SVM for...mlaij
Machine translation outputs are not correct enough to be used as it is, except for the very simplest
translations. They only give the general meaning of a sentence not the exact translation. As Machine
Translation (MT) is gaining a position in the whole world, there is a need for estimating the quality of
machine translation outputs. Many prominent MT-Researchers are trying to make the MT-System, that
produces very good and accurate translations and that also covers maximum language pairs. If good
translations out of all translations can be categorized then the time and cost can be saved to a great extent.
Now, Good quality translations will be sent for post-editing and rest will be sent for pre-editing or
retranslation. In this paper, Kneser Ney smoothing language model is used to calculate the probability of
machine translated output. But a translation cannot be said good or bad. Based on its probability score
there are many other parameters that effect its quality. The quality of machine translation is made easier to
estimate for post-editing by using two different predefined famous algorithms for classification.
1) The document presents a dependency-to-string translation model for a Chinese-Japanese statistical machine translation system.
2) The system achieves a BLEU score of 34.87 and a RIBES score of 79.25 on the Chinese-Japanese translation task, outperforming a baseline PBSMT system.
3) The dependency-to-string model uses two types of translation rules - HDR rules with generalized dependency fragments on the source side and strings on the target side, and H rules with single words on the source side.
This document discusses methods for generating descriptive elements (DEs) to summarize texts for queries.
It presents two main works: (1) extracting candidate DEs and (2) assigning DEs to texts. For the first work, it extracts DE candidates from web search results and evaluates them to find adequate candidates. For the second work, it assumes texts with the same DE contain similar words and uses triggers of co-occurring words to assign DEs, achieving high recall but low precision. It then explores using modification relations between words to construct triggers, but precision remains low.
The conclusion is that triggers alone do not ensure precision in DE assignment. The system needs to use only the part of the text that explains
This document describes a music search engine and lyrics advisor application. It allows users to search for a song title by entering lyrics or the song name. It also detects and tags strong language in lyrics, and provides advice on songs containing vulgar words. The system is designed with a search algorithm, strong word detection algorithm, and database to store lyrics. It is built with PHP, Dreamweaver for the interface, and uses a database like phpMyAdmin. The search engine matches input to lyrics and song titles in the database. The strong word detection compares lyrics to a list of vulgar words and provides advice if a match is found. Limitations include issues with special characters and other languages due to PHP and UTF-8 encoding problems.
This document summarizes AllegroGraph as a graph database. It discusses AllegroGraph's capabilities as a quintuple store, RDF store, and graph database. It describes AllegroGraph's architecture, extreme use cases including with AMDOCS and the pharmaceutical industry, and includes a demo. Key capabilities highlighted include AllegroGraph's support for property graphs, querying, transactions, indexing, distribution, and languages. Graph algorithms and social network analysis functions using AllegroGraph's generator model are also summarized.
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
Data science interviews can be particularly difficult due to the many proficiencies that you'll have to demonstrate (technical skills, problem solving, communication) and the generally high bar to entry for the industry.we Provide Top 100+ Google Data Science Interview Questions : All You Need to know to Crack it
visit by :-https://www.datacademy.ai/google-data-science-interview-questions/
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
The task of sentiment analysis of reviews is carried out using manually built / automatically generated
lexicon resources of their own with which terms are matched with lexicon to compute the term count for
positive and negative polarity. On the other hand the Sentiwordnet, which is quite different from other
lexicon resources that gives scores (weights) of the positive and negative polarity for each word. The
polarity of a word namely positive, negative and neutral have the score ranging between 0 to 1 indicates
the strength/weight of the word with that sentiment orientation. In this paper, we show that using the
Sentiwordnet, how we could enhance the performance of the classification at both sentence and document
level.
The document discusses building a machine learning model for resume classification using natural language processing techniques. It explores the dataset of resumes and profiles, performs text preprocessing, feature engineering, and builds various classification models to accurately classify resumes. The best performing model is random forest classification, which achieves 100% accuracy on the test data with no errors, overfitting, or misclassifications.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Classification of Machine Translation Outputs Using NB Classifier and SVM for...mlaij
Machine translation outputs are not correct enough to be used as it is, except for the very simplest
translations. They only give the general meaning of a sentence not the exact translation. As Machine
Translation (MT) is gaining a position in the whole world, there is a need for estimating the quality of
machine translation outputs. Many prominent MT-Researchers are trying to make the MT-System, that
produces very good and accurate translations and that also covers maximum language pairs. If good
translations out of all translations can be categorized then the time and cost can be saved to a great extent.
Now, Good quality translations will be sent for post-editing and rest will be sent for pre-editing or
retranslation. In this paper, Kneser Ney smoothing language model is used to calculate the probability of
machine translated output. But a translation cannot be said good or bad. Based on its probability score
there are many other parameters that effect its quality. The quality of machine translation is made easier to
estimate for post-editing by using two different predefined famous algorithms for classification.
1) The document presents a dependency-to-string translation model for a Chinese-Japanese statistical machine translation system.
2) The system achieves a BLEU score of 34.87 and a RIBES score of 79.25 on the Chinese-Japanese translation task, outperforming a baseline PBSMT system.
3) The dependency-to-string model uses two types of translation rules - HDR rules with generalized dependency fragments on the source side and strings on the target side, and H rules with single words on the source side.
This document discusses methods for generating descriptive elements (DEs) to summarize texts for queries.
It presents two main works: (1) extracting candidate DEs and (2) assigning DEs to texts. For the first work, it extracts DE candidates from web search results and evaluates them to find adequate candidates. For the second work, it assumes texts with the same DE contain similar words and uses triggers of co-occurring words to assign DEs, achieving high recall but low precision. It then explores using modification relations between words to construct triggers, but precision remains low.
The conclusion is that triggers alone do not ensure precision in DE assignment. The system needs to use only the part of the text that explains
This document describes a music search engine and lyrics advisor application. It allows users to search for a song title by entering lyrics or the song name. It also detects and tags strong language in lyrics, and provides advice on songs containing vulgar words. The system is designed with a search algorithm, strong word detection algorithm, and database to store lyrics. It is built with PHP, Dreamweaver for the interface, and uses a database like phpMyAdmin. The search engine matches input to lyrics and song titles in the database. The strong word detection compares lyrics to a list of vulgar words and provides advice if a match is found. Limitations include issues with special characters and other languages due to PHP and UTF-8 encoding problems.
This document summarizes AllegroGraph as a graph database. It discusses AllegroGraph's capabilities as a quintuple store, RDF store, and graph database. It describes AllegroGraph's architecture, extreme use cases including with AMDOCS and the pharmaceutical industry, and includes a demo. Key capabilities highlighted include AllegroGraph's support for property graphs, querying, transactions, indexing, distribution, and languages. Graph algorithms and social network analysis functions using AllegroGraph's generator model are also summarized.
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
Data science interviews can be particularly difficult due to the many proficiencies that you'll have to demonstrate (technical skills, problem solving, communication) and the generally high bar to entry for the industry.we Provide Top 100+ Google Data Science Interview Questions : All You Need to know to Crack it
visit by :-https://www.datacademy.ai/google-data-science-interview-questions/
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETijfcstjournal
The task of sentiment analysis of reviews is carried out using manually built / automatically generated
lexicon resources of their own with which terms are matched with lexicon to compute the term count for
positive and negative polarity. On the other hand the Sentiwordnet, which is quite different from other
lexicon resources that gives scores (weights) of the positive and negative polarity for each word. The
polarity of a word namely positive, negative and neutral have the score ranging between 0 to 1 indicates
the strength/weight of the word with that sentiment orientation. In this paper, we show that using the
Sentiwordnet, how we could enhance the performance of the classification at both sentence and document
level.
The document discusses building a machine learning model for resume classification using natural language processing techniques. It explores the dataset of resumes and profiles, performs text preprocessing, feature engineering, and builds various classification models to accurately classify resumes. The best performing model is random forest classification, which achieves 100% accuracy on the test data with no errors, overfitting, or misclassifications.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
3. Example of output
Need to grow older with a girl like you
Finally see you are naturally
The one to make it so easy
When you show me the truth
Yeah, I’d rather be with you
Say you want the same thing too
TAGS: (ROMANCE)positive
3
4. DATA SET
1. Used a Web Crawler to collect Data from a
few listed Websites and used them as our
data set. Some of the sites were:
a. www.azlyrics.com
b. www.lyrics.com
c. www.metrolyrics.com
2. The data was already tagged.
4
5. DATA SET(Contd.)
We created data set for five emotions. The
training set consists of about a little less than
1500 songs tagged with their emotions.
5
6. Basic Statistics
1. Number of documents in different tags:
2007
a. Positive-975
b. Negative-1032
2. Average length of documents:
Words:253.23 Characters:1007.33
6
11. Use of keywords
1. A set of keywords for each label was made:-
words that are more likely to affect the
song’s label.
2. They had been added manually in the python
script.
3. Their numbers are less but can be expanded
easily by searching for same on the Web.
11
13. Rhyme Scheme
● Added a function to Python script to
generate rhyme scheme of stanzas in a song’s
lyrics
● Ran through all the songs in a given folder
● Based on the generated rhyme scheme, we
give a score to the RHYME attribute, which
essentially tells the Degree of rhyming in
that song. 13
14. Rhyme Scheme
We observe that certain classes (like romantic
and sad songs) tend to have high value for the
RHYME attribute
This attribute will be used for classification
14
16. tf-idf value of a word
1. Term frequency-inverse document frequency
reflects how important is a word to a
document in a corpus
2. tf-idf value increases proportionally to
number of times a word appears in a
document and inversely to number of times
it appears in other.
3. Applied using NLTK. 16
17. Using POS tags as features
We assume that that different genres of songs
will also differ in the different categories (POS)
of words they use.
We count the number of words (normalized)
for each POS tag category (45 such categories in
Penn treebank).
17
18. Using POS tags as features
Steps:
1) Remove punctuation, expanded contractions
(I’m -> I am).
2) Tokenize
3) Do POS tagging
4) Count frequency of each pos tag / number of
total words
18
20. Shifting to SVM
Applied linear SVM in scikit after tf-idf
vectorizing
The features used include: 1) Category
keywords, 2) rhyme scheme, 3) POS tagging,
20
21. Training and Test Data Set
● Used 20 percent of Data for Testing and 80
percent for training.
● The data was uniformly selected as 1 in every
5 for training.
● If you lower the number of samples in the
training , the samples for the model being
built will have too few samples.
21
22. Contd.
● One of the shortcomings that I have always
found in these techniques is that one of the
assumptions is that by random sampling you
will achieve independence and also a
smooth generation of samples without any
bias of the dataset.
22
28. Lyrics different from just
sentences
● Song may contain series of negative
sentences but end on positive/uplifting note
● Mood/meaning of song not clear just by
considering sentences independent
● Love song may express how happy the singer
was in a relationship but sadness of breakup
expressed in the end
28
29. Lyrics different from just
sentences
● Lyrics can be VERY ABSTRACT!
What’s the matter with the clothes I’m wearing?
Can’t you tell that your tie’s too wide?
Maybe I should buy some old tab collars?
Welcome back to the age of jive.
● Hard to figure out that this stanza expresses
positive emotion
29
30. Lyrics different from just
sentences
● Song may express positive emotion about
negative things
● Eg. rap songs frequently express positive
emotion about murder, shooting, drugs,
guns
Whole new level of confusion!
30
31. Problems
Text inaccuracies: spelling errors
Use of slangs
Metaphors, sarcasm
Cannot capture features(pace, beat, melody etc)
of the song just from lyrics
These features important - no solution 31