Query data from search engines can provide many insights about the human behavior. Therefore, massive
data resulting from human interactions may offer a new perspective on the behavior of the market. By
analyzing Google query database for search terms, we present a method of analyzing large numbers of
search queries to predict outcomes such as movie incomes. Our results illustrate the potential of combining
extensive behavioral data sets that offer a better understanding of collective human behavior.
This document provides an overview of deep learning applications across various domains including art, music, images, language, healthcare, agriculture, sports and more. It discusses how deep learning is used for tasks like image generation, style transfer, speech generation, machine translation, disease prediction, crop yield prediction, and game strategies. The document also briefly discusses the future of social AI and concludes that deep learning will revolutionize many fields while noting resources for learning more about the topic.
Forecast Model for Box-Office Revenue of Bollywood Feature FilmsPrerit Kohli
We consider the technique to forecast the net revenue collections of a feature film. Previous work on this problem has been addressed majorly to Hollywood films with very limited work on motion pictures developed by the Hindi Film Industry – Bollywood. In this piece of work, we use the parameters governing a movie’s revenue and the historical revenue gross patterns for forecasting. We also show that the model can be used for low budget movies which are usually left out by technology giants like Google, Twitter etc. due to negligible buzz for the movie as compared to that for high-budget ones.
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...ijaia
Movies are among the most prominent contributors to the global entertainment industry today, and they
are among the biggest revenue-generating industries from a commercial standpoint. It's vital to divide
films into two categories: successful and unsuccessful. To categorize the movies in this research, a variety
of models were utilized, including regression models such as Simple Linear, Multiple Linear, and Logistic
Regression, clustering techniques such as SVM and K-Means, Time Series Analysis, and an Artificial
Neural Network. The models stated above were compared on a variety of factors, including their accuracy
on the training and validation datasets as well as the testing dataset, the availability of new movie
characteristics, and a variety of other statistical metrics. During the course of this study, it was discovered
that certain characteristics have a greater impact on the likelihood of a film's success than others. For
example, the existence of the genre action may have a significant impact on the forecasts, although another
genre, such as sport, may not. The testing dataset for the models and classifiers has been taken from the
IMDb website for the year 2020. The Artificial Neural Network, with an accuracy of 86 percent, is the best
performing model of all the models discussed.
The International School of Vietnam is an 18,000 square meter pre K-12 school located in Hanoi, Vietnam that is undergoing renovation. It will have 984 students and a student to teacher ratio of 14:1. The school aims to provide a unique learning environment that incorporates Vietnam's culture and allows students to learn in different modalities. It will have facilities like a library, science labs, art rooms, and sports facilities to support its 21st century curriculum. The first phase will open in 2012 and additional grades will be added each year until the first graduating class in 2017.
Reportaje "Logra más comodidad en todas las habitaciones" en el que se muestran diversos espacios pensados para aportar el máximo confort a cada una de las estancias de la vivienda. Incluye una fotografía y una mención a una cocina diseñada por la decoradora Margarida Muñoz (www.margaridamunoz.com)
El reportaje fue realizado por el equipo de la revista de interiorismo "Trucos de Casa" (número extra de "Cosas de Casa", RBA) y publicado en el número 6 (octubre de 2015).
The dkNET ESP meeting covered the current status and future plans of the dkNET project. Key topics discussed included introducing a new dkNET team member, progress made on previous ESP recommendations like the new portal and user notifications, upcoming outreach activities, and planning for the next ESP meeting in June 2016.
We eat food to obtain nutrients and vitamins that our bodies need, and our digestive system extracts these substances. In addition to digestion, respiration or breathing is also part of nutrition by providing oxygen. The circulatory system is essential for nutrition as it transports the nutrients we eat and oxygen we breathe to our organs, bones, brain, and muscles. Our blood contains three types of cells that each perform specific functions. Vitamins are essential nutrients our bodies need. Junk food contains high levels of sugar and fat, and if consumed in excess can accumulate in our bodies, causing weight gain and potential illness.
This document provides an overview of deep learning applications across various domains including art, music, images, language, healthcare, agriculture, sports and more. It discusses how deep learning is used for tasks like image generation, style transfer, speech generation, machine translation, disease prediction, crop yield prediction, and game strategies. The document also briefly discusses the future of social AI and concludes that deep learning will revolutionize many fields while noting resources for learning more about the topic.
Forecast Model for Box-Office Revenue of Bollywood Feature FilmsPrerit Kohli
We consider the technique to forecast the net revenue collections of a feature film. Previous work on this problem has been addressed majorly to Hollywood films with very limited work on motion pictures developed by the Hindi Film Industry – Bollywood. In this piece of work, we use the parameters governing a movie’s revenue and the historical revenue gross patterns for forecasting. We also show that the model can be used for low budget movies which are usually left out by technology giants like Google, Twitter etc. due to negligible buzz for the movie as compared to that for high-budget ones.
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...ijaia
Movies are among the most prominent contributors to the global entertainment industry today, and they
are among the biggest revenue-generating industries from a commercial standpoint. It's vital to divide
films into two categories: successful and unsuccessful. To categorize the movies in this research, a variety
of models were utilized, including regression models such as Simple Linear, Multiple Linear, and Logistic
Regression, clustering techniques such as SVM and K-Means, Time Series Analysis, and an Artificial
Neural Network. The models stated above were compared on a variety of factors, including their accuracy
on the training and validation datasets as well as the testing dataset, the availability of new movie
characteristics, and a variety of other statistical metrics. During the course of this study, it was discovered
that certain characteristics have a greater impact on the likelihood of a film's success than others. For
example, the existence of the genre action may have a significant impact on the forecasts, although another
genre, such as sport, may not. The testing dataset for the models and classifiers has been taken from the
IMDb website for the year 2020. The Artificial Neural Network, with an accuracy of 86 percent, is the best
performing model of all the models discussed.
The International School of Vietnam is an 18,000 square meter pre K-12 school located in Hanoi, Vietnam that is undergoing renovation. It will have 984 students and a student to teacher ratio of 14:1. The school aims to provide a unique learning environment that incorporates Vietnam's culture and allows students to learn in different modalities. It will have facilities like a library, science labs, art rooms, and sports facilities to support its 21st century curriculum. The first phase will open in 2012 and additional grades will be added each year until the first graduating class in 2017.
Reportaje "Logra más comodidad en todas las habitaciones" en el que se muestran diversos espacios pensados para aportar el máximo confort a cada una de las estancias de la vivienda. Incluye una fotografía y una mención a una cocina diseñada por la decoradora Margarida Muñoz (www.margaridamunoz.com)
El reportaje fue realizado por el equipo de la revista de interiorismo "Trucos de Casa" (número extra de "Cosas de Casa", RBA) y publicado en el número 6 (octubre de 2015).
The dkNET ESP meeting covered the current status and future plans of the dkNET project. Key topics discussed included introducing a new dkNET team member, progress made on previous ESP recommendations like the new portal and user notifications, upcoming outreach activities, and planning for the next ESP meeting in June 2016.
We eat food to obtain nutrients and vitamins that our bodies need, and our digestive system extracts these substances. In addition to digestion, respiration or breathing is also part of nutrition by providing oxygen. The circulatory system is essential for nutrition as it transports the nutrients we eat and oxygen we breathe to our organs, bones, brain, and muscles. Our blood contains three types of cells that each perform specific functions. Vitamins are essential nutrients our bodies need. Junk food contains high levels of sugar and fat, and if consumed in excess can accumulate in our bodies, causing weight gain and potential illness.
This document describes a study that uses a neural network to predict the success level (flop, hit, or superhit) of Indian movies. The researchers:
1. Collected data on factors that influence movie success, such as the actors, directors, producers, etc. and their past movie performances.
2. Used this historical data to assign weights to each factor and develop a methodology to classify movies into success levels based on thresholds.
3. Trained a neural network using the weighted input data to automate the movie success prediction process.
4. Evaluated the model and found it could accurately predict success levels for 93.3% of movies, based on a confusion matrix analysis of actual
A comparative analysis of machine learning approaches for movie success predi...IRJET Journal
This document discusses using machine learning approaches to predict the success of movies. It proposes using algorithms like k-NN, SVM, and Gaussian Naive Bayes to classify movies as hits or flops based on attributes from an IMDb dataset like actors, directors, genres, and budget. The approaches are evaluated on metrics like accuracy. Visualizations of the data show relationships between attributes like higher IMDb ratings correlating with greater box office totals, and some genres and actors associating with more successful movies on average. The proposed work aims to help investors and audiences predict a movie's potential for success.
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...IRJET Journal
This document describes a study that developed an enhanced movie recommendation engine (MRE) using content filtering, collaborative filtering, and popularity filtering. The MRE analyzes movie data from three datasets and makes recommendations based on similarities in movie titles, genres, plots, casts, directors, keywords, vote counts, and vote averages. Evaluation shows the MRE achieves a root mean squared error of 0.873 and mean absolute error of 0.671 when using collaborative filtering, indicating good performance. The MRE provides a more personalized and accurate recommendation system for movies by combining multiple filtering techniques.
This document describes a movie recommendation system project that will use collaborative filtering techniques to predict movie ratings and recommend movies to users. The project will use the MovieLens dataset to identify user demographics and movie genres and classify them using different algorithms. Conditional inference trees and random forests will be implemented and evaluated on the MovieLens data, with the highest accuracy achieved using age, gender, occupation, and genre features. Exploratory data analysis of the MovieLens data found that most users are students aged 20-30 and most movies are from the 1990s across many genres.
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET Journal
This document presents a study that developed a mathematical model to predict the success of movies based on their popularity factors collected from social media. Specifically:
- The study collected data on popularity factors like the number of followers for actors, actresses, directors, and writers from social media platforms.
- It then used a k-nearest neighbors (k-NN) machine learning algorithm to classify movies as a hit, flop, or neutral based on their collected popularity factors.
- The k-NN algorithm searches for the k nearest training examples and classifies a test example based on the majority class of its neighbors. This allows the model to predict a movie's success class before its release.
IRJET- Movie Success Prediction using Data Mining and Social MediaIRJET Journal
The document discusses predicting the success of movies using data mining and machine learning techniques. Specifically, it aims to use linear regression models to predict box office revenues and classify movies as hits or flops based on attributes like the cast, directors, genres, and other movie features. Sentiment analysis of social media data like tweets is also used to predict success. The proposed approach involves feature selection, normalization, training regression models on historical movie data, and using the models to predict revenues for new movies. Evaluation of the models shows they can accurately predict box office figures. The goal is to help movie studios and reduce uncertainty in decision making.
Scene Description From Images To SentencesIRJET Journal
This document presents an approach for generating sentences to describe images using distributed intelligence. It involves detecting objects in images using YOLO detection, finding relative positions of objects, labeling background scenes, generating tuples of objects/scenes/relations, extracting candidate sentences from Wikipedia containing tuple elements, searching images for each sentence and selecting the sentence whose images most closely match the input image. The approach is compared to the Babytalk model using BLEU and ROUGE scores, showing comparable performance. Future work to improve object detection and use larger knowledge sources is discussed.
This document summarizes an article from the International Journal of Advanced Research in Engineering and Technology (IJARET) about enhancing movie recommender systems. The article discusses different types of recommender systems including collaborative filtering, content-based filtering, and hybrid filtering approaches. It then proposes a hybrid item-based recommender system that combines usage data, tags, and movie metadata like genres, stars, and directors to improve recommendation accuracy. The proposed approach is evaluated using a dataset and performance metrics to test the effectiveness of the enhanced movie recommender system.
The document analyzes movie market share data from 2010-2013, including 667 movies. A random effects logit model with a 10-week sliding window is used to estimate how characteristics like budget, distributor, release timing, and days in theaters impact weekly market share. Results show the random effects model produces more significant coefficients compared to a normal linear regression. Higher budget, release later in the year, and longer theater run are associated with higher market share. Actors are not included due to data limitations but other characteristics still have meaningful effects on box office performance.
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
PROJECT REPORT
• Performed memory-based collaborative filtering techniques like Cosine similarities, Pearson’s r & model-based Matrix Factorization techniques like Alternating Least Squares (ALS) method
• Studied the scalability of these methods on local machines & on Hadoop clusters
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...IJECEIAES
Customer reviews are a valuable source of information from which we can extract very useful data about different online shopping experiences. For trendy items (products, movies, TV shows, hotels, services . . . ), the number of available users and customers’ opinions could easily surpass thousands. Therefore, online reputation systems could aid potential customers in making the right decision (buying, renting, booking . . . ) by automatically mining textual reviews and their ratings. This paper presents MTVRep, a movie and TV show reputation system that incorporates fine-grained opinion mining and semantic analysis to generate and visualize reputation toward movies and TV shows. Differently from previous studies on reputation generation that treat the task of sentiment analysis as a binary classification problem (positive, negative), the proposed system identifies the sentiment strength during the phase of sentiment classification by using fine-grained sentiment analysis to separate movie and TV show reviews into five discrete classes: strongly negative, weakly negative, neutral, weakly positive and strongly positive. Besides, it employs embeddings from language models (ELMo) representations to extract semantic relations between reviews. The contribution of this paper is threefold. First, movie and TV show reviews are separated into five groups based on their sentiment orientation. Second, a custom score is computed for each opinion group. Finally, a numerical reputation value is produced toward the target movie or TV show. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world movie and TV show dataset.
Tags Prediction from Movie Plot Synopsis Using Machine LearningIRJET Journal
The document describes a study that aims to predict tags for movies from plot synopses using machine learning. It discusses prior work on tag prediction and movie genre classification from plot summaries. The proposed methodology preprocesses plot synopses, explores the dataset distribution and features, and tests several machine learning models including Naive Bayes, Logistic Regression, and TF-IDF vectorization to predict tags for movies based on plot synopses. The goal is to automatically extract meaningful tags that characterize movies in order to enhance user experience and recommendations.
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...CSCJournals
Our objective in this research is to provide a framework that will allow project managers, business owners, and developers an effective way to forecast the trend in software defects within a software project in real-time. By providing these stakeholders with a mechanism for forecasting defects, they can then provide the necessary resources at the right time in order to remove these defects before they become too much ultimately leading to software failure. In our research, we will not only show general trends in several open-source projects but also show trends in daily, monthly, and yearly activity. Our research shows that we can use this forecasting method up to 6 months out with only an MSE of 0.019. In this paper, we present our technique and methodologies for developing the inputs for the proposed model and the results of testing on seven open source projects. Further, we discuss the prediction models, the performance, and the implementation using the FBProphet framework and the ARIMA model.
Spotify Stream Prediction using Regression ModelsIRJET Journal
This document summarizes a research paper that used regression models to predict the number of streams for songs on Spotify based on song attributes. The researchers collected a Spotify dataset and performed exploratory data analysis to identify influential variables. They used linear regression, random forest, ridge regression, and lasso regression models and found that random forest achieved the highest prediction accuracy of 97.48%. Key song attributes like loudness and energy were found to most impact streams. The research aimed to understand what makes songs popular and to approximately predict streams for new songs based on their attributes.
This document compares models to predict whether a movie will break even financially in the US market. It finds that gradient boosting and neural network models accurately predict break even status over 75% of the time for sample data and over 70% for future data. The most influential factors are IMDB user votes, reviews, and scores, while Facebook likes for the top three actors and overall cast are also important. Further analysis could incorporate additional data like theater receipts and timestamps.
IRJET- Hybrid Recommendation System for MoviesIRJET Journal
This document describes a hybrid recommendation system for movies that combines collaborative and content-based filtering. It uses the MovieLens rating dataset and supplements it with additional data from IMDB, such as movie details. Algorithms like nearest neighbors collaborative filtering and content-based filtering are used to provide personalized movie recommendations to users. The system architecture and design are outlined, including user profiles, movie searching, and success prediction for upcoming movies. An evaluation of the system demonstrates how additional content features can improve recommendation accuracy over collaborative filtering alone.
Image retrieval and re ranking techniques - a surveysipij
There is a huge amount of research work focusing on the searching, retrieval and re-ranking of images in
the image database. The diverse and scattered work in this domain needs to be collected and organized for
easy and quick reference.
Relating to the above context, this paper gives a brief overview of various image retrieval and re-ranking
techniques. Starting with the introduction to existing system the paper proceeds through the core
architecture of image harvesting and retrieval system to the different Re-ranking techniques. These
techniques are discussed in terms of approaches, methodologies and findings and are listed in tabular form
for quick review.
ZERNIKE-ENTROPY IMAGE SIMILARITY MEASURE BASED ON JOINT HISTOGRAM FOR FACE RE...AM Publications
The direction of image similarity for face recognition required a combination of powerful tools and stable in case of any challenges such as different illumination, various environment and complex poses etc. In this paper, we combined very robust measures in image similarity and face recognition which is Zernike moment and information theory in one proposed measure namely Zernike-Entropy Image Similarity Measure (Z-EISM). Z-EISM based on incorporates the concepts of Picard entropy and a modified one dimension version of the two dimensions joint histogram of the two images under test. Four datasets have been used to test, compare, and prove that the proposed Z-EISM has better performance than the existing measures
COMPUTER VISION PERFORMANCE AND IMAGE QUALITY METRICS: A RECIPROCAL RELATION csandit
Computer vision algorithms are essential components of many systems in operation today. Predicting the robustness of such algorithms for different visual distortions is a task which can
be approached with known image quality measures. We evaluate the impact of several image distortions on object segmentation, tracking and detection, and analyze the predictability of this impact given by image statistics, error parameters and image quality metrics. We observe that
existing image quality metrics have shortcomings when predicting the visual quality of virtual or augmented reality scenarios. These shortcomings can be overcome by integrating computer vision approaches into image quality metrics. We thus show that image quality metrics can be
used to predict the success of computer vision approaches, and computer vision can be employed to enhance the prediction capability of image quality metrics – a reciprocal relation.
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation cscpconf
Computer vision algorithms are essential components of many systems in operation today.
Predicting the robustness of such algorithms for different visual distortions is a task which can
be approached with known image quality measures. We evaluate the impact of several image
distortions on object segmentation, tracking and detection, and analyze the predictability of this
impact given by image statistics, error parameters and image quality metrics. We observe that
existing image quality metrics have shortcomings when predicting the visual quality of virtual
or augmented reality scenarios. These shortcomings can be overcome by integrating computer
vision approaches into image quality metrics. We thus show that image quality metrics can be
used to predict the success of computer vision approaches, and computer vision can be
employed to enhance the prediction capability of image quality metrics – a reciprocal relation.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
This document describes a study that uses a neural network to predict the success level (flop, hit, or superhit) of Indian movies. The researchers:
1. Collected data on factors that influence movie success, such as the actors, directors, producers, etc. and their past movie performances.
2. Used this historical data to assign weights to each factor and develop a methodology to classify movies into success levels based on thresholds.
3. Trained a neural network using the weighted input data to automate the movie success prediction process.
4. Evaluated the model and found it could accurately predict success levels for 93.3% of movies, based on a confusion matrix analysis of actual
A comparative analysis of machine learning approaches for movie success predi...IRJET Journal
This document discusses using machine learning approaches to predict the success of movies. It proposes using algorithms like k-NN, SVM, and Gaussian Naive Bayes to classify movies as hits or flops based on attributes from an IMDb dataset like actors, directors, genres, and budget. The approaches are evaluated on metrics like accuracy. Visualizations of the data show relationships between attributes like higher IMDb ratings correlating with greater box office totals, and some genres and actors associating with more successful movies on average. The proposed work aims to help investors and audiences predict a movie's potential for success.
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...IRJET Journal
This document describes a study that developed an enhanced movie recommendation engine (MRE) using content filtering, collaborative filtering, and popularity filtering. The MRE analyzes movie data from three datasets and makes recommendations based on similarities in movie titles, genres, plots, casts, directors, keywords, vote counts, and vote averages. Evaluation shows the MRE achieves a root mean squared error of 0.873 and mean absolute error of 0.671 when using collaborative filtering, indicating good performance. The MRE provides a more personalized and accurate recommendation system for movies by combining multiple filtering techniques.
This document describes a movie recommendation system project that will use collaborative filtering techniques to predict movie ratings and recommend movies to users. The project will use the MovieLens dataset to identify user demographics and movie genres and classify them using different algorithms. Conditional inference trees and random forests will be implemented and evaluated on the MovieLens data, with the highest accuracy achieved using age, gender, occupation, and genre features. Exploratory data analysis of the MovieLens data found that most users are students aged 20-30 and most movies are from the 1990s across many genres.
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET Journal
This document presents a study that developed a mathematical model to predict the success of movies based on their popularity factors collected from social media. Specifically:
- The study collected data on popularity factors like the number of followers for actors, actresses, directors, and writers from social media platforms.
- It then used a k-nearest neighbors (k-NN) machine learning algorithm to classify movies as a hit, flop, or neutral based on their collected popularity factors.
- The k-NN algorithm searches for the k nearest training examples and classifies a test example based on the majority class of its neighbors. This allows the model to predict a movie's success class before its release.
IRJET- Movie Success Prediction using Data Mining and Social MediaIRJET Journal
The document discusses predicting the success of movies using data mining and machine learning techniques. Specifically, it aims to use linear regression models to predict box office revenues and classify movies as hits or flops based on attributes like the cast, directors, genres, and other movie features. Sentiment analysis of social media data like tweets is also used to predict success. The proposed approach involves feature selection, normalization, training regression models on historical movie data, and using the models to predict revenues for new movies. Evaluation of the models shows they can accurately predict box office figures. The goal is to help movie studios and reduce uncertainty in decision making.
Scene Description From Images To SentencesIRJET Journal
This document presents an approach for generating sentences to describe images using distributed intelligence. It involves detecting objects in images using YOLO detection, finding relative positions of objects, labeling background scenes, generating tuples of objects/scenes/relations, extracting candidate sentences from Wikipedia containing tuple elements, searching images for each sentence and selecting the sentence whose images most closely match the input image. The approach is compared to the Babytalk model using BLEU and ROUGE scores, showing comparable performance. Future work to improve object detection and use larger knowledge sources is discussed.
This document summarizes an article from the International Journal of Advanced Research in Engineering and Technology (IJARET) about enhancing movie recommender systems. The article discusses different types of recommender systems including collaborative filtering, content-based filtering, and hybrid filtering approaches. It then proposes a hybrid item-based recommender system that combines usage data, tags, and movie metadata like genres, stars, and directors to improve recommendation accuracy. The proposed approach is evaluated using a dataset and performance metrics to test the effectiveness of the enhanced movie recommender system.
The document analyzes movie market share data from 2010-2013, including 667 movies. A random effects logit model with a 10-week sliding window is used to estimate how characteristics like budget, distributor, release timing, and days in theaters impact weekly market share. Results show the random effects model produces more significant coefficients compared to a normal linear regression. Higher budget, release later in the year, and longer theater run are associated with higher market share. Actors are not included due to data limitations but other characteristics still have meaningful effects on box office performance.
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
PROJECT REPORT
• Performed memory-based collaborative filtering techniques like Cosine similarities, Pearson’s r & model-based Matrix Factorization techniques like Alternating Least Squares (ALS) method
• Studied the scalability of these methods on local machines & on Hadoop clusters
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...IJECEIAES
Customer reviews are a valuable source of information from which we can extract very useful data about different online shopping experiences. For trendy items (products, movies, TV shows, hotels, services . . . ), the number of available users and customers’ opinions could easily surpass thousands. Therefore, online reputation systems could aid potential customers in making the right decision (buying, renting, booking . . . ) by automatically mining textual reviews and their ratings. This paper presents MTVRep, a movie and TV show reputation system that incorporates fine-grained opinion mining and semantic analysis to generate and visualize reputation toward movies and TV shows. Differently from previous studies on reputation generation that treat the task of sentiment analysis as a binary classification problem (positive, negative), the proposed system identifies the sentiment strength during the phase of sentiment classification by using fine-grained sentiment analysis to separate movie and TV show reviews into five discrete classes: strongly negative, weakly negative, neutral, weakly positive and strongly positive. Besides, it employs embeddings from language models (ELMo) representations to extract semantic relations between reviews. The contribution of this paper is threefold. First, movie and TV show reviews are separated into five groups based on their sentiment orientation. Second, a custom score is computed for each opinion group. Finally, a numerical reputation value is produced toward the target movie or TV show. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world movie and TV show dataset.
Tags Prediction from Movie Plot Synopsis Using Machine LearningIRJET Journal
The document describes a study that aims to predict tags for movies from plot synopses using machine learning. It discusses prior work on tag prediction and movie genre classification from plot summaries. The proposed methodology preprocesses plot synopses, explores the dataset distribution and features, and tests several machine learning models including Naive Bayes, Logistic Regression, and TF-IDF vectorization to predict tags for movies based on plot synopses. The goal is to automatically extract meaningful tags that characterize movies in order to enhance user experience and recommendations.
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...CSCJournals
Our objective in this research is to provide a framework that will allow project managers, business owners, and developers an effective way to forecast the trend in software defects within a software project in real-time. By providing these stakeholders with a mechanism for forecasting defects, they can then provide the necessary resources at the right time in order to remove these defects before they become too much ultimately leading to software failure. In our research, we will not only show general trends in several open-source projects but also show trends in daily, monthly, and yearly activity. Our research shows that we can use this forecasting method up to 6 months out with only an MSE of 0.019. In this paper, we present our technique and methodologies for developing the inputs for the proposed model and the results of testing on seven open source projects. Further, we discuss the prediction models, the performance, and the implementation using the FBProphet framework and the ARIMA model.
Spotify Stream Prediction using Regression ModelsIRJET Journal
This document summarizes a research paper that used regression models to predict the number of streams for songs on Spotify based on song attributes. The researchers collected a Spotify dataset and performed exploratory data analysis to identify influential variables. They used linear regression, random forest, ridge regression, and lasso regression models and found that random forest achieved the highest prediction accuracy of 97.48%. Key song attributes like loudness and energy were found to most impact streams. The research aimed to understand what makes songs popular and to approximately predict streams for new songs based on their attributes.
This document compares models to predict whether a movie will break even financially in the US market. It finds that gradient boosting and neural network models accurately predict break even status over 75% of the time for sample data and over 70% for future data. The most influential factors are IMDB user votes, reviews, and scores, while Facebook likes for the top three actors and overall cast are also important. Further analysis could incorporate additional data like theater receipts and timestamps.
IRJET- Hybrid Recommendation System for MoviesIRJET Journal
This document describes a hybrid recommendation system for movies that combines collaborative and content-based filtering. It uses the MovieLens rating dataset and supplements it with additional data from IMDB, such as movie details. Algorithms like nearest neighbors collaborative filtering and content-based filtering are used to provide personalized movie recommendations to users. The system architecture and design are outlined, including user profiles, movie searching, and success prediction for upcoming movies. An evaluation of the system demonstrates how additional content features can improve recommendation accuracy over collaborative filtering alone.
Image retrieval and re ranking techniques - a surveysipij
There is a huge amount of research work focusing on the searching, retrieval and re-ranking of images in
the image database. The diverse and scattered work in this domain needs to be collected and organized for
easy and quick reference.
Relating to the above context, this paper gives a brief overview of various image retrieval and re-ranking
techniques. Starting with the introduction to existing system the paper proceeds through the core
architecture of image harvesting and retrieval system to the different Re-ranking techniques. These
techniques are discussed in terms of approaches, methodologies and findings and are listed in tabular form
for quick review.
ZERNIKE-ENTROPY IMAGE SIMILARITY MEASURE BASED ON JOINT HISTOGRAM FOR FACE RE...AM Publications
The direction of image similarity for face recognition required a combination of powerful tools and stable in case of any challenges such as different illumination, various environment and complex poses etc. In this paper, we combined very robust measures in image similarity and face recognition which is Zernike moment and information theory in one proposed measure namely Zernike-Entropy Image Similarity Measure (Z-EISM). Z-EISM based on incorporates the concepts of Picard entropy and a modified one dimension version of the two dimensions joint histogram of the two images under test. Four datasets have been used to test, compare, and prove that the proposed Z-EISM has better performance than the existing measures
COMPUTER VISION PERFORMANCE AND IMAGE QUALITY METRICS: A RECIPROCAL RELATION csandit
Computer vision algorithms are essential components of many systems in operation today. Predicting the robustness of such algorithms for different visual distortions is a task which can
be approached with known image quality measures. We evaluate the impact of several image distortions on object segmentation, tracking and detection, and analyze the predictability of this impact given by image statistics, error parameters and image quality metrics. We observe that
existing image quality metrics have shortcomings when predicting the visual quality of virtual or augmented reality scenarios. These shortcomings can be overcome by integrating computer vision approaches into image quality metrics. We thus show that image quality metrics can be
used to predict the success of computer vision approaches, and computer vision can be employed to enhance the prediction capability of image quality metrics – a reciprocal relation.
Computer Vision Performance and Image Quality Metrics : A Reciprocal Relation cscpconf
Computer vision algorithms are essential components of many systems in operation today.
Predicting the robustness of such algorithms for different visual distortions is a task which can
be approached with known image quality measures. We evaluate the impact of several image
distortions on object segmentation, tracking and detection, and analyze the predictability of this
impact given by image statistics, error parameters and image quality metrics. We observe that
existing image quality metrics have shortcomings when predicting the visual quality of virtual
or augmented reality scenarios. These shortcomings can be overcome by integrating computer
vision approaches into image quality metrics. We thus show that image quality metrics can be
used to predict the success of computer vision approaches, and computer vision can be
employed to enhance the prediction capability of image quality metrics – a reciprocal relation.
Similar to Predicting movie success from search (20)
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Predicting movie success from search
1. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
DOI : 10.5121/ijaia.2016.7101 1
PREDICTING MOVIE SUCCESS FROM SEARCH
QUERY USING SUPPORT VECTOR REGRESSION
METHOD
Chanseung Lee1
and Mina Jung2
1
Sheldon High School, Eugene, OR, USA
2
Electrical Engineering and Computer Science, Syracuse University
Syracuse, NY, USA
ABSTRACT
Query data from search engines can provide many insights about the human behavior. Therefore, massive
data resulting from human interactions may offer a new perspective on the behavior of the market. By
analyzing Google query database for search terms, we present a method of analyzing large numbers of
search queries to predict outcomes such as movie incomes. Our results illustrate the potential of combining
extensive behavioral data sets that offer a better understanding of collective human behavior.
KEYWORDS
Prediction; Support Vector Regression; Linear Regression; Ridge Regression
1. INTRODUCTION
Analyzing search queries from the Internet users can provide extensive societal applications. For
example, Ginsberg et al. [1] estimated the spread of influenza in the United States based on
Google search logs. They processed search queries from Google search logs and generated a
comprehensive model for use in influenza surveillance, with regional and state-level estimates of
influenza-like illness activity in the United States.
Predicting the financial success of a movie is a challenging, open problem. Sharda and Delen [2]
have trained a neural network to process pre-release data, such as quality and popularity
variables, and classified movies into nine categories according to their anticipated income, from
“flop” to “blockbuster”. With test samples, it is evident that the neural network classifies only
36.9% of the movies correctly, while 75.2% of the movies are at most one category away from
the anticipated category.
Joshi et al. [3] have built a multivariate linear regression model that joined meta-data with text
features from pre-release critiques to predict the revenue with a coefficient of determination.
Since predictions based on classic quality factors fail to reach a level of accuracy high enough for
practical applications, the usage of user-generated data to predict the success of a movie becomes
a very tempting approach.
Mestyan et al. [4] built a predictive model for the financial success of movies based on collective
activity data of online users. They show that the popularity of a movie can be predicted before its
2. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
2
release by measuring and analyzing the activity level of editors and viewers of the corresponding
entry to the movie in Wikipedia.
We try to find out whether a statistical prediction of the box-office success of a movie can be
made using worldwide searchers’ queries along with other information such as rating, the
quantity of theaters the movie was starred in, and the length of the title. Our proposed system
builds an automated method of predicting movie incomes from search queries.
In [5], we used Linear Regression and Ridge Regression to predict movie incomes. However,
since both Linear Regression and Ridge Regression are linear models, they fitted well only in
linearly separable problems. To overcome this linearity problem, we improved the method by
using a non-linear regression method with kernel function. The kernel function expands current
input space into a higher dimension, and automatically finds an optimal model in the expanded
dimension. Specifically, we use Support Vector Regression (SVR) method with Polykernel
function. Our experimental results show that our kernel method SVR improves the performance
of the system.
In the next section we describe the data sources used in the prediction, and the characteristics of
query data and prediction process are described in Section 3. In Section 4, we explain predicting
methods and experimental results. Section 5 concludes this paper and suggests our future
directions.
2. DATA SOURCES
2.1 Data Sources
We collect the movie query data as the main variable from the Google Trends to predict a success
of a movie. In order to improve the prediction, we use three additional information: Income(I),
Rate(R), and number of theaters(T) from Box Office Mojo site [6]. While we collected movie
data released in America from January to March in 2014 in the previous work [5], the number of
movies are now doubled by adding data from April to June in 2014. The data variables are
described in the following.
1) Movie Query: Google Trends [7] provides weekly search query frequencies of movie titles.
Instead of using weekly query data, query results of four weeks are combined into one
variable. From the query data, we extract four variables (M1, M2, M3, M4) where M1
means the sum of queries during 4 weeks before the release date, and M2 means queries
of 5 to 8 weeks before the release date, and so on. Altogether, we are using query data of
total 16 weeks as input. Our first hypothesis is that people who enter queries of movie title
are interested in the movie, and thus likely to contribute to the movie’s income.
Sometimes the words within movie titles are so common (e.g. Highway) that the queries
about the titles of the movies extensively overlap with unrelated queries. In these cases,
other related queries including actors and directors are collected, and the average value of
these query data are used.
2) Rate (R): We use the rating (G, PG, PG-13, R, NR) of a movie.
3) Number of theaters (T): The number of theaters the movie made it to.
4) Number of words in the movie title (W): It is sometimes known that movies with simple
titles succeed. We want to see the effect of the number of words in movie title.
3. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
3
5) Income of movie (I): This is the target variable we would like to predict. For the income of
the movie, we used the income of the first weekend after the release date. For
computational simplicity, logarithmic value of income is predicted in this paper.
2.2 Prediction Process
The process of entire prediction consists of three parts: Data Collection, Data Preprocessing, and
Prediction .
(a) Data Collection: This process uses eight variables. Among the collected variables, rate,
number of theaters the movie reached, first week income, and release date are collected
from Box Office Mojo website, and entered into the Excel data file. For each movie, we
entered the movie title into Google Trends, and downloaded the query data along with
the rest in a csv form. The contents of these data are described in the following section.
(b) Data Preprocessing: We developed a Java program which reads data from the process of
Data Collection, and processes and stores them in a one single master file that is ready
for multivariable regression.
(c) Movie Prediction: We developed our income prediction system using Python. We
applied several algorithms: Linear Regression, Regularized Linear Regression, and
Support Vector Regression. These algorithms are from sklearn library [8] of Python
language.
3 CHARACTERISTICS OF QUERY DATA
We could find some interesting characteristics of movie query data in terms of frequency trends.
In most movies, as expected, their query frequencies drastically increase near the opening date.
These movies include: Life of a King, Noah, Pompeii, Ride Along, The LEGO Movie, RoboCop,
The Raid 2, Veronica Mars, That Awkward Moment, Gang of Ghosts, Mistaken by Strangers,
Muppets Most Wanted, Son of God, That Awkward Moment, The Monument Man, The Nut Job,
etc. As shown in Figure 1(a), the number of queries about these movies dramatically increases
near the opening date of the movie. In this graph, the x axis is the number of weeks before a
movie was released and the y axis is the frequency of the search queries of the movie.
In other cases, the movies’ titles are so common that their queries do not show significant
increase near opening date. These movies include Gloria, Infliction, The Rocket, Refuge,
Teenage, The French Minister, Visitors, Highway, etc. When people type in these queries, the
queries are not always directed towards the movie. For example, when a Google user enters query
“Teenage”, could mean the movie “Teenage” or teenager itself. For these movies, the queries of
movie title are not a good indicator of movie income. Therefore, we use query data for the names
of main actors and director of the movie. These queries are combined and their average is used as
input. Figure 1(b) shows the frequency trends of using general terms in movie titles.
Queries of some movies are affected by social factors. These movies include Labor Day (at peak
near Labor Day), The Great Flood (at peak when an actual flood strikes), The Lunchbox
(launched in India first), etc. For instance, “Labor Day” might mean the actual Labor Day or
movie. Figure 1(c) shows the patterns of queries about these movies. We used the same approach
used in Figure 1(b) of movies with general/common terms. We used queries about main actor and
director of the movie, and their average was used in the following experiments in Section 4.
4. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
Figure
4 PREDICTION METHODS AND EXPERIMENTS
We have used Linear Regression and Ridge Regression to predict movie incomes
work [5]. However, since those methods fit well only in linearly separable problems, we apply a
new non-linear regression method with kernel function. Specifically, we use Support Vector
Regression (SVR) method with Polykernel function.
To collect data, we manually typed in movie titles at Google Trends and downloaded query data
in csv form. We could collect as many as 298 movies released from January to June of 2014 from
Google Trends and 8 movies were omitted due to lack of sufficient query data. We used this data
for the following prediction methods.
4.1 Linear Regression Method
Predicting continuous outcome is a regression problem in statistics. Simple Multivariate Linear
Regression [9] is the first method we chose to attempt. Multivariate Linear Regression
straight line among a scatterplot that minimizes residuals, i.e. sum of squared errors given as
follows
where ܫ் and ܫ mean the true income and the predicted income of the movies, respectively. We
look for the following linear hyperplane
I ൌ wୖ·ܴ ݓ்··ܶ
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
(a) (b)
(c)
Figure 1: Characteristics of movie query frequencies
4 PREDICTION METHODS AND EXPERIMENTS
used Linear Regression and Ridge Regression to predict movie incomes
However, since those methods fit well only in linearly separable problems, we apply a
linear regression method with kernel function. Specifically, we use Support Vector
Regression (SVR) method with Polykernel function.
typed in movie titles at Google Trends and downloaded query data
in csv form. We could collect as many as 298 movies released from January to June of 2014 from
Google Trends and 8 movies were omitted due to lack of sufficient query data. We used this data
for the following prediction methods.
4.1 Linear Regression Method
Predicting continuous outcome is a regression problem in statistics. Simple Multivariate Linear
Regression [9] is the first method we chose to attempt. Multivariate Linear Regression
straight line among a scatterplot that minimizes residuals, i.e. sum of squared errors given as
E ൌ min ∑ ሺܫ் െ ܫሻଶ
(1)
mean the true income and the predicted income of the movies, respectively. We
ar hyperplane which minimizes the above error E.
ݓௐ·ܹ ݓெଵ·1ܯ ݓெଶ·2ܯ ݓெଷ·3ܯ ݓெସ
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
4
used Linear Regression and Ridge Regression to predict movie incomes in ourearlier
However, since those methods fit well only in linearly separable problems, we apply a
linear regression method with kernel function. Specifically, we use Support Vector
typed in movie titles at Google Trends and downloaded query data
in csv form. We could collect as many as 298 movies released from January to June of 2014 from
Google Trends and 8 movies were omitted due to lack of sufficient query data. We used this data
Predicting continuous outcome is a regression problem in statistics. Simple Multivariate Linear
Regression [9] is the first method we chose to attempt. Multivariate Linear Regression creates a
straight line among a scatterplot that minimizes residuals, i.e. sum of squared errors given as
mean the true income and the predicted income of the movies, respectively. We
ସ·4ܯ ݓ(2)
5. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
In this formula, R, T, W, M1, M2, M3,
while w means the corresponding weight of each variable. The goal of learning is to correctly
estimate the weight (w’s) of each variable in regression line.
We developed a Phython program that uses Multivariate Linear Regression from sklearn library
[8] of Python. Parameters of Multivariate Linear Regression were set to be the default values. We
have a total of 298 movie data; 70% of them is used a training data and 30% is used for testing
data. Figure 2 shows the relationship between
which was estimated by the input variables. As we can see in Figure 2, the result was promising.
Many prediction points are gathered near/around diagonal line, and its mean square error (MSE)
is 6.08. These results show that we are able
reasonable accuracy.
Figure 2: Result of multivariate linear regression
4.2 Ridge Linear Regression
The next question is whether we can improve the performance of linear regression
predicting movie incomes. We carried out the second experiment using a different linear
regression model, called Regularized Linear Regression (RLR). RLR is known to be more robust
to noise and outliers in data [10
powerful method to disregard noise and solve the problem. RLR is less sensitive to outliers by
minimizing both the sum of error and
E
where w୧ means the coefficient of input
is the square of each coefficient
value of w୧′ݏ are, the higher the value of E is.
Figure 3 shows the reason why Ridge Regression is robust to noise. When outliers (noise) are
present as green dots within the green circle, the purple dashed line is the linear regression line,
which is not correct one. In RLR, by adding a regularization term, RLR is le
outliers and makes a more reasonable line, the red line which is far from the outliers.
We run RLR function in sklearn library [8] with default parameter values. Figure 4 shows the
relationship between true Income
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
R, T, W, M1, M2, M3, and M4 are the variables that we use for linear regression
the corresponding weight of each variable. The goal of learning is to correctly
) of each variable in regression line.
We developed a Phython program that uses Multivariate Linear Regression from sklearn library
on. Parameters of Multivariate Linear Regression were set to be the default values. We
have a total of 298 movie data; 70% of them is used a training data and 30% is used for testing
data. Figure 2 shows the relationship between true Income and predicted Income of the movies
which was estimated by the input variables. As we can see in Figure 2, the result was promising.
Many prediction points are gathered near/around diagonal line, and its mean square error (MSE)
is 6.08. These results show that we are able to predict movie incomes by using query data with a
Figure 2: Result of multivariate linear regression
4.2 Ridge Linear Regression
The next question is whether we can improve the performance of linear regression
predicting movie incomes. We carried out the second experiment using a different linear
regression model, called Regularized Linear Regression (RLR). RLR is known to be more robust
[10]. When noises in data are expected, regularized regression is a
powerful method to disregard noise and solve the problem. RLR is less sensitive to outliers by
minimizing both the sum of error and regularization term, as given as in the following,
ൌ min∑ ሺܫ் െ ܫሻଶ
ߙ·ݓ
ଶ
ሺ3)
means the coefficient of input variables (R, T, W, M1-M4, etc). The regularization term
is the square of each coefficient w୧ and ߙ signifies the regularization constant. The higher the
r the value of E is.
the reason why Ridge Regression is robust to noise. When outliers (noise) are
present as green dots within the green circle, the purple dashed line is the linear regression line,
which is not correct one. In RLR, by adding a regularization term, RLR is less sensitive to
outliers and makes a more reasonable line, the red line which is far from the outliers.
We run RLR function in sklearn library [8] with default parameter values. Figure 4 shows the
true Income and predicted Income using RLR. Many predictions are
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
5
we use for linear regression
the corresponding weight of each variable. The goal of learning is to correctly
We developed a Phython program that uses Multivariate Linear Regression from sklearn library
on. Parameters of Multivariate Linear Regression were set to be the default values. We
have a total of 298 movie data; 70% of them is used a training data and 30% is used for testing
of the movies
which was estimated by the input variables. As we can see in Figure 2, the result was promising.
Many prediction points are gathered near/around diagonal line, and its mean square error (MSE)
to predict movie incomes by using query data with a
The next question is whether we can improve the performance of linear regression model in
predicting movie incomes. We carried out the second experiment using a different linear
regression model, called Regularized Linear Regression (RLR). RLR is known to be more robust
ed, regularized regression is a
powerful method to disregard noise and solve the problem. RLR is less sensitive to outliers by
, as given as in the following,
. The regularization term
The higher the
the reason why Ridge Regression is robust to noise. When outliers (noise) are
present as green dots within the green circle, the purple dashed line is the linear regression line,
ss sensitive to
outliers and makes a more reasonable line, the red line which is far from the outliers.
We run RLR function in sklearn library [8] with default parameter values. Figure 4 shows the
g RLR. Many predictions are
6. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
gathered near the diagonal line, but were a bit more spread than linear regression model. The
result was worse than Simple Linear Regression, and the RLR has a greater error of MSE 7.47.
This result tells us that, in the propose
nor noise. Therefore, the regularized linear regression actually degrades the accuracy of the
prediction system.
Figure 3: Regularized Linear Regression
Figure 4: Results of Regularized
4.3 Non Linearity Problem and Kernel Function
In Section 4.1 and 4.2, we used a multivariate linear regression model and ridge linear regression.
In both of these methods, a straight line is described by a simple equation that calculates movie
income from input variables. The purpose of these linear regression
optimal values for the slope that defines the linear line that comes closest to the data.
However, the biggest drawback of these models is that they can only so
Linear problems are learning problems where the class values can be approximated from the input
by linear hyperplanes. These linear algorithms cannot fit them in a non
environment. We know that vast majority of real worl
and this makes linear regression model obsolete. Like most of other real world problems, the
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
gathered near the diagonal line, but were a bit more spread than linear regression model. The
result was worse than Simple Linear Regression, and the RLR has a greater error of MSE 7.47.
This result tells us that, in the proposed movie prediction system, most input data are not outliers
nor noise. Therefore, the regularized linear regression actually degrades the accuracy of the
Figure 3: Regularized Linear Regression
Figure 4: Results of Regularized Regression
Non Linearity Problem and Kernel Function
In Section 4.1 and 4.2, we used a multivariate linear regression model and ridge linear regression.
In both of these methods, a straight line is described by a simple equation that calculates movie
income from input variables. The purpose of these linear regression-like model is to find the
optimal values for the slope that defines the linear line that comes closest to the data.
However, the biggest drawback of these models is that they can only solve linear problems.
Linear problems are learning problems where the class values can be approximated from the input
by linear hyperplanes. These linear algorithms cannot fit them in a non-linear problem
environment. We know that vast majority of real world problems belong to non-linear problems,
and this makes linear regression model obsolete. Like most of other real world problems, the
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
6
gathered near the diagonal line, but were a bit more spread than linear regression model. The
result was worse than Simple Linear Regression, and the RLR has a greater error of MSE 7.47.
d movie prediction system, most input data are not outliers
nor noise. Therefore, the regularized linear regression actually degrades the accuracy of the
In Section 4.1 and 4.2, we used a multivariate linear regression model and ridge linear regression.
In both of these methods, a straight line is described by a simple equation that calculates movie
like model is to find the
optimal values for the slope that defines the linear line that comes closest to the data.
lve linear problems.
Linear problems are learning problems where the class values can be approximated from the input
linear problem
linear problems,
and this makes linear regression model obsolete. Like most of other real world problems, the
7. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
7
learning problem for movie prediction is not a linear problem and does not necessarily follow a
straight line.
(a) (b)
Figure 5: Non-linearity in input variables
Figure 5 shows the example of non-linearity relationships of the input variables. Figure 5(a)
shows the relationship between number of theaters and movie income. It clearly shows that a
straight line cannot effectively describe the relationship. Figure 5(b) shows even more complex
fitting model between number of queries and movie income. We can also infer that the simple
specific input variable, number of theaters keeps closer relationship with the movie income than
the conventional input variable, number of queries.
In this paper, we adopt a kernel technique to solve this non-linearity problem. The basic idea of
kernel method is to expand original dimension into higher dimension, and find a linear line in the
expanded dimension. Figure 6 shows an example of kernel method where 2D input space is
expanded to 3D feature space. We can see that there is a linear hyperplane in 3D feature space.
Suppose Φ is the mapping from original dimension to expanded dimension. If we transform our
points to feature space, the scalar product form of two original data x1 and x2 looks like this:
<Φ(x1), Φ(x2)>
However, computing the value of Φ(x1) and Φ(x2) is very time-consuming and difficult. The key
insight of kernel method is that there exists a class of functions called kernel functions that can be
used to optimize the computation of this scalar product. Instead of computing Φ(x1) and Φ(x2),
respectively, the whole term can be replaced by a kernel function. A kernel is a function K(x1,
x2) that has the following property
K(x1, x2) =< Φ(x1), Φ(x2)>
for some mapping Φ. In other words, we can evaluate the scalar product of data in the low-
dimensional data space without having to transform to the high-dimensional feature space.
8. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
4.4 Support Vector Regression Method
Support Vector Regression (SVR) method
we use SVR method as kernel algorithm. Kernel transforms current input dimension into an
expanded dimension as described in Figure 6, and finds an optimal linear hyperplane in the
expanded dimension. Many kernels are pr
sigmoid, etc) and we use polynomial kernel since it provides the importance of each variable as a
result of learning. We used SOM module in Weka [12
of applying SVR method is presented in Figure 7.
SVR provides the best accuracy compared to the previous regression methods. Its MSE is 4.85,
which is a significant improvement than the two other regression methods. Polynomial kernel
generates the important/weight of each in
from SVR with polynomial kernel. As we can see, in predicting movie incomes, the number of
theater is the most important factor and the query of the
important factors. The oldest query (M4) and number of words are least influence factors in the
prediction.
We carried out another experiment for SVR prediction model. The original variable set contains
four query variables (M1, M2, M3, and M4). Among them, we select
(M1, M2) and run the program. The result is shown in Figure 8. The MSE(=4.91) of this model is
very close to that of Figure 7, the original SVR. Therefore, when predicting movie incomes, we
can conclude that two month of recent q
show that query data are very important indicators of movie incomes, and we can predict the
outcomes quite accurately.
Figure 7: Result of support vector regression
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
Figure6: Kernel Method
Support Vector Regression Method
Support Vector Regression (SVR) method [11] is the most popular kernel regression method, so
we use SVR method as kernel algorithm. Kernel transforms current input dimension into an
expanded dimension as described in Figure 6, and finds an optimal linear hyperplane in the
expanded dimension. Many kernels are proposed in SVR method (e.g. Polynomial, RBF,
sigmoid, etc) and we use polynomial kernel since it provides the importance of each variable as a
. We used SOM module in Weka [12] software to run SVR algorithm. The result
od is presented in Figure 7.
SVR provides the best accuracy compared to the previous regression methods. Its MSE is 4.85,
which is a significant improvement than the two other regression methods. Polynomial kernel
generates the important/weight of each input variable. Table 1 shows the importance of variables
from SVR with polynomial kernel. As we can see, in predicting movie incomes, the number of
theater is the most important factor and the query of the 1st
and 2nd
months are the second most
tors. The oldest query (M4) and number of words are least influence factors in the
We carried out another experiment for SVR prediction model. The original variable set contains
four query variables (M1, M2, M3, and M4). Among them, we select only two most recent query
(M1, M2) and run the program. The result is shown in Figure 8. The MSE(=4.91) of this model is
very close to that of Figure 7, the original SVR. Therefore, when predicting movie incomes, we
can conclude that two month of recent query is enough to do the job. These experiments clearly
show that query data are very important indicators of movie incomes, and we can predict the
: Result of support vector regression
International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
8
popular kernel regression method, so
we use SVR method as kernel algorithm. Kernel transforms current input dimension into an
expanded dimension as described in Figure 6, and finds an optimal linear hyperplane in the
oposed in SVR method (e.g. Polynomial, RBF,
sigmoid, etc) and we use polynomial kernel since it provides the importance of each variable as a
] software to run SVR algorithm. The result
SVR provides the best accuracy compared to the previous regression methods. Its MSE is 4.85,
which is a significant improvement than the two other regression methods. Polynomial kernel
put variable. Table 1 shows the importance of variables
from SVR with polynomial kernel. As we can see, in predicting movie incomes, the number of
months are the second most
tors. The oldest query (M4) and number of words are least influence factors in the
We carried out another experiment for SVR prediction model. The original variable set contains
only two most recent query
(M1, M2) and run the program. The result is shown in Figure 8. The MSE(=4.91) of this model is
very close to that of Figure 7, the original SVR. Therefore, when predicting movie incomes, we
uery is enough to do the job. These experiments clearly
show that query data are very important indicators of movie incomes, and we can predict the
9. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
9
Table1: Importance of each variable
Attribute Weight
Rate 0.1076
# of Theater 0.5613
# of Words 0.0285
1st
Month query (M1) 0.1503
2nd
Month query (M2) 0.2972
3rd
Month query (M3) 0.0976
4th
Month query (M4) 0.0643
4.5 Discussions
We have applied a number of methods, including linear regression, ridge linear regression, and
support vector regression, to the problem of predicting movie income. Among them, support
vector regression method showed the best performance with MSE=4.85, while MSE of linear
regression and ridge regression are, 6.08 and 7.47, respectively.
Since the problem we attack has the characteristic of non-linearity, as shown in Figure 5, support
vector regression, which can effectively process non-linear problems,showed best prediction
results.
5 CONCLUSIONS
We developed a software program that can predict the future incomes of movies using Google
search query data and some additional information about movies. We found that a movie’s
success (total income) is largely based on the number of theaters and searchers’ queries. It is also
slightly affected by its length of title and rating.
We tried several regression methods including Simple Linear Regression, Ridge Linear
Regression and Support Vector Regression. Among them Support Vector Regression using
Polynomial kernel shows the best performance in predicting movie incomes.
In the future, we will include some other important variables about query and/or movie such as
trend of query, regional data, and so on. By using search query data, we will be able to predict
many things in society. We will apply our prediction system to other interesting domains and
predict various phenomena using query data.
10. International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 7, No. 1, January 2016
10
REFERENCES
[1] Jeremy Ginsberg et al. “Detecting influenza epidemics using search engine query data” Nature,45(9),
2009.
[2] Ramesh Shardaand Dursun Delen. “Predicting box-office success of motion pictures with neural
networks” Expert Systems with Applications 30(2), 2006
[3] Mahesh Joshi, Dipanjan Das, Kevin Gimpel and Noah A. Smith. “Movie reviews and revenues: An
experiment in text regression” In Proceedings of NAACL-HLT 2010, 2010.
[4] Marton Mestyan, Taha Yasseri, and Janos Kertesz. “Early prediction of movie box office success
based on wikipedia activity” PLoS ONE, 8(8), 2013.
[5] Chanseung Lee and Mina Jung “Predicting Movie Incomes Using Search Engine Query
Data”International Conference on Artificial Intelligence and Pattern Recognition (AIPR), 2014
[6] Box office Mojo. http://www.boxofficemojo.com.
[7] Google trends. http://www.google.com/trends.
[8] Scikit-learn: Machine learning in python. http://scikit-learn.org.
[9] Wikipedia. http://en.wikipedia.org/wiki/Linear_regression
[10] Wikipedia https://en.wikipedia.org/wiki/Ridge_regression
[11] H. Drucker, Kaufman Burges, and V. Smola. Support Vector Regression Machines, NIPS, 1996
[12] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H.Witten;
“The WEKA Data Mining Software: An Update” SIGKDD Explorations, Volume 11, Issue 1, 2009