This document analyzes trending videos on YouTube using the Hadoop MapReduce framework. The author collected data on trending YouTube videos in the United States, including video details and user interaction metrics. This data was merged with category information and stored in HDFS. Then MapReduce jobs were run to analyze the data and answer questions about video trends. The results showed that entertainment and music videos were most common. Specific channels and videos were identified as most viewed, disliked, or trending over many days. This analysis can help YouTube users understand popular content and user interests.
IRJET- Hybrid Recommendation System for MoviesIRJET Journal
This document describes a hybrid recommendation system for movies that combines collaborative and content-based filtering. It uses the MovieLens rating dataset and supplements it with additional data from IMDB, such as movie details. Algorithms like nearest neighbors collaborative filtering and content-based filtering are used to provide personalized movie recommendations to users. The system architecture and design are outlined, including user profiles, movie searching, and success prediction for upcoming movies. An evaluation of the system demonstrates how additional content features can improve recommendation accuracy over collaborative filtering alone.
The power of unstructured data: Recommendation systemsOlga Scrivner
This document discusses unstructured data and natural language processing techniques. It begins by stating that 80% of data will be unstructured and that natural language is full of ambiguity, using contextual clues and idioms. It then provides examples of common NLP tasks like text mining, recommendation systems, and language challenges. Specific techniques discussed include word embeddings like Word2Vec and GloVe, as well as feature extraction methods and recommendation system types like collaborative filtering. The document concludes by providing an example of using NLP for a job recommendation system, including preprocessing job descriptions and calculating cosine similarity between items.
The document proposes a multi-tier sentiment analysis system called MSABDP to analyze large-scale social media data more efficiently. MSABDP uses Hadoop for its distributed processing and storage capabilities. It collects Twitter data using Apache Flume and stores it in HDFS. It then applies a multi-tier classification approach combining lexicon-based and machine learning techniques to classify tweets into multiple sentiment classes, reducing complexity compared to single-tier architectures. Evaluation on real Twitter data showed MSABDP improved classification accuracy over single-tier approaches by 7%.
Presentation of the Master thesis defense on 04/13/2020.
Title:
Context-aware Graph Embedding for Session-based News Recommendation
Abstract:
Existing methods on session-based news recommendation mainly focus on extracting features from news articles and sequential user-item interactions, but they usually ignore the semantic-level structural information among news articles and do not explore external knowledge sources. In this paper, we propose a novel Context-Aware Graph Embedding (CAGE) framework for session-based news recommendation, which builds an auxiliary knowledge graph to enrich the semantic meaning of entities involved in articles, and further refines the article embeddings by graph convolutional networks. Experimental results on real-world news datasets demonstrate the effectiveness of our method compared with the state-of-the-art methods on session-based news recommendation
Adaptive Search Based On User Tags in Social NetworkingIOSR Journals
This document summarizes an article about adaptive search based on user tags in social networking. It discusses using tags that users apply to images in social media sites like Flickr to improve image search and personalize results. It proposes using topic models to identify different meanings of ambiguous tags and a user's interests to display more relevant images. The framework involves reranking images based on aesthetics scores predicted from user comments, and using tag-based and group-based metadata to discover topics and personalize search results. Future work could further analyze community-generated metadata to identify interests and refine search algorithms.
This document summarizes a research paper on video recommendation and ranking in video search engines. The paper proposes a novel video re-ranking framework that improves the performance of semantic video indexing and retrieval by re-assessing the scores of video shots based on their homogeneity and relevance to the video topic. It reviews related work on topics like content-based video recommendation, visual search re-ranking, and using social network data for recommendations. The proposed framework uses keyword expansion, video pool expansion, and visual query expansion to better capture user search intentions. It concludes that re-ranking examples based on predicted likelihood scores from classifiers can improve retrieval performance.
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
This document discusses analyzing YouTube data using the Hadoop framework. It proposes a system to filter and analyze YouTube comment content to remove sensitive data using natural language processing and store the data in Hadoop Distributed File System (HDFS). MapReduce is used to extract key-value pairs from the data and Hadoop provides a scalable platform for analyzing the large-scale YouTube data.
IRJET- Hybrid Recommendation System for MoviesIRJET Journal
This document describes a hybrid recommendation system for movies that combines collaborative and content-based filtering. It uses the MovieLens rating dataset and supplements it with additional data from IMDB, such as movie details. Algorithms like nearest neighbors collaborative filtering and content-based filtering are used to provide personalized movie recommendations to users. The system architecture and design are outlined, including user profiles, movie searching, and success prediction for upcoming movies. An evaluation of the system demonstrates how additional content features can improve recommendation accuracy over collaborative filtering alone.
The power of unstructured data: Recommendation systemsOlga Scrivner
This document discusses unstructured data and natural language processing techniques. It begins by stating that 80% of data will be unstructured and that natural language is full of ambiguity, using contextual clues and idioms. It then provides examples of common NLP tasks like text mining, recommendation systems, and language challenges. Specific techniques discussed include word embeddings like Word2Vec and GloVe, as well as feature extraction methods and recommendation system types like collaborative filtering. The document concludes by providing an example of using NLP for a job recommendation system, including preprocessing job descriptions and calculating cosine similarity between items.
The document proposes a multi-tier sentiment analysis system called MSABDP to analyze large-scale social media data more efficiently. MSABDP uses Hadoop for its distributed processing and storage capabilities. It collects Twitter data using Apache Flume and stores it in HDFS. It then applies a multi-tier classification approach combining lexicon-based and machine learning techniques to classify tweets into multiple sentiment classes, reducing complexity compared to single-tier architectures. Evaluation on real Twitter data showed MSABDP improved classification accuracy over single-tier approaches by 7%.
Presentation of the Master thesis defense on 04/13/2020.
Title:
Context-aware Graph Embedding for Session-based News Recommendation
Abstract:
Existing methods on session-based news recommendation mainly focus on extracting features from news articles and sequential user-item interactions, but they usually ignore the semantic-level structural information among news articles and do not explore external knowledge sources. In this paper, we propose a novel Context-Aware Graph Embedding (CAGE) framework for session-based news recommendation, which builds an auxiliary knowledge graph to enrich the semantic meaning of entities involved in articles, and further refines the article embeddings by graph convolutional networks. Experimental results on real-world news datasets demonstrate the effectiveness of our method compared with the state-of-the-art methods on session-based news recommendation
Adaptive Search Based On User Tags in Social NetworkingIOSR Journals
This document summarizes an article about adaptive search based on user tags in social networking. It discusses using tags that users apply to images in social media sites like Flickr to improve image search and personalize results. It proposes using topic models to identify different meanings of ambiguous tags and a user's interests to display more relevant images. The framework involves reranking images based on aesthetics scores predicted from user comments, and using tag-based and group-based metadata to discover topics and personalize search results. Future work could further analyze community-generated metadata to identify interests and refine search algorithms.
This document summarizes a research paper on video recommendation and ranking in video search engines. The paper proposes a novel video re-ranking framework that improves the performance of semantic video indexing and retrieval by re-assessing the scores of video shots based on their homogeneity and relevance to the video topic. It reviews related work on topics like content-based video recommendation, visual search re-ranking, and using social network data for recommendations. The proposed framework uses keyword expansion, video pool expansion, and visual query expansion to better capture user search intentions. It concludes that re-ranking examples based on predicted likelihood scores from classifiers can improve retrieval performance.
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
This document discusses analyzing YouTube data using the Hadoop framework. It proposes a system to filter and analyze YouTube comment content to remove sensitive data using natural language processing and store the data in Hadoop Distributed File System (HDFS). MapReduce is used to extract key-value pairs from the data and Hadoop provides a scalable platform for analyzing the large-scale YouTube data.
This document describes a project to classify YouTube videos into categories using machine learning models trained on video metadata. The authors analyzed 240,000 YouTube video records containing metadata fields like title, description, view count, etc. They preprocessed the data, engineered new features, and used text analysis and decision tree classifiers to predict the video category with around 67% accuracy based on the metadata. Key steps included text preprocessing, converting categorical fields into binary variables, and extracting additional features from fields like publish date and duration. The goal of categorization is to improve YouTube search, recommendations, and the overall user experience.
YouTube Title Prediction Using Sentiment AnalysisIRJET Journal
This document describes a sentiment analysis algorithm that analyzes YouTube comments to predict whether they are positive or negative. It uses linear regression to evaluate comments and categorize videos based on the sentiment analysis results. Graphs are generated to help YouTube creators understand feedback on their videos. The algorithm also generates new video title suggestions for creators by analyzing titles from similar videos on related topics. It uses keyword extraction and a transformer neural network to produce the new titles.
This dashboard analyzes trending videos on YouTube in India to understand factors that influence videos to appear in the trending section. The dashboard collects data from the YouTube API, cleans it, and stores it in a database. Visualizations including scatter plots, line plots, pie charts, bar charts and histograms are generated from the data to show trends like popular publishing hours, videos by day of the week, title formatting, top channels, and title lengths. The dashboard is deployed on Heroku so it is publicly available for creators to analyze trends and optimize their content.
This document describes a social media content analysis project that analyzes data from platforms like Facebook, Instagram, Twitter, and YouTube. The project is a web application that allows business users, influencers, and personal users to view analytics on their social media profiles. It features post scheduling, campaign generation, sentiment analysis of comments, and hashtag suggestion. The system architecture includes modules for analyzing metrics like followers/likes, post scheduling, campaign generation, sentiment analysis using Naive Bayes or SVM, and hashtag suggestion using keywords from images. The project aims to help users monitor their social media growth and enhance their profiles through automated posting and campaign tools.
Analysis and Prediction of Sentiments for Cricket Tweets using HadoopIRJET Journal
This document describes a study that analyzed and predicted sentiments for cricket tweets using Hadoop. The researchers collected tweets related to cricket matches from Twitter and analyzed them using an unsupervised machine learning method to predict the winning and losing teams based on the polarity (positive or negative sentiment) of the tweets. Hadoop was used as it can process large volumes of unstructured social media data in a distributed manner. The study aims to develop an intelligent system to accurately analyze sentiments from cricket tweets and predict match outcomes prior to games being played.
IRJET- Opinion Mining on Pulwama AttackIRJET Journal
This document discusses analyzing sentiment from tweets about the 2019 Pulwama attack in India. It proposes collecting tweets using Apache Nifi, storing them in HDFS, preprocessing the data with Hive, and analyzing sentiment using Tableau. The methodology involves fetching tweets via Twitter API, uploading to HDFS with Nifi, structuring the unstructured JSON data with Hive, removing null values with preprocessing, and visualizing sentiment analysis results in Tableau. Tools like Hadoop, MapReduce, HDFS, Hive, Nifi and Tableau are discussed for handling big Twitter data and performing sentiment analysis in order to understand public opinions expressed on social media.
This document provides a summary of a major project on analyzing YouTube trending videos using a database management system. It includes sections on acknowledgements, an abstract, introduction, scope, input design, database design, and coding. The abstract indicates the project involves collecting time-series data on over 8,000 YouTube videos over 9 months to analyze trending videos across their lifecycle and identify characteristics of trending videos and channels. The introduction discusses the importance of understanding trending videos due to their potential to become popular. The scope describes preliminary data analysis and processing of the dataset. The input, database, and coding sections outline the input requirements, proposed database tables to store video and user data, and use of SQL.
MapReduce allows distributed processing of large datasets across clusters of computers. It works by splitting the input data into independent chunks which are processed by the map function in parallel. The map function produces intermediate key-value pairs which are grouped by the reduce function to form the output data. Fault tolerance is achieved through replication of data across nodes and re-executing failed tasks. This makes MapReduce suitable for efficiently processing very large datasets in a distributed environment.
This document presents a proposed system called "One Stop Recommendation" that aims to provide movie and television show recommendations for multiple over-the-top (OTT) platforms like Netflix, Amazon Prime Video, and Hotstar. It would create a single dashboard with screens for each OTT platform. Data would be collected from sources like Kaggle and Google Forms. The system would use different recommendation techniques like content-based filtering, collaborative filtering, and cosine similarity to provide unified recommendations across platforms. It aims to help users more easily find content suggestions and gain insights from visualization of the recommendation data.
This document presents a proposed system called "One Stop Recommendation" that aims to provide movie and television show recommendations for multiple over-the-top (OTT) platforms like Netflix, Amazon Prime Video, and Hotstar. It would create a single dashboard with screens for each OTT platform. Data would be collected from sources like Kaggle and Google Forms. The system would use different recommendation techniques like content-based filtering, collaborative filtering, and cosine similarity to provide unified recommendations across platforms. It aims to help users more easily find content suggestions and gain insights from visualization of the recommendation data.
A Data Science Project using data mining techniques (N-Grams, TF-IDF text analytics, sentiment detection) combined with R and ggplot2 for exploratory data analysis to predict stock market trends based on world news events sourced from Reddit /r/worldnews using Decision Trees and SVM (Support Vector Machines) on KNIME. All experiments were done using public cloud infrastructure, running HIVE queries to prefilter data with HDInsights on Azure.
Detailed Investigation of Text Classification and Clustering of Twitter Data ...ijtsrd
As of late there has been a growth in data. This paper presents a methodology to investigate the text classification of data gathered from twitter. In this study sentiment analysis has been done on online comment data giving us picture of how to discover the demands of a people. Ziya Fatima | Er. Vandana "Detailed Investigation of Text Classification and Clustering of Twitter Data for Business Analytics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38527.pdf Paper Url: https://www.ijtsrd.com/engineering/computer-engineering/38527/detailed-investigation-of-text-classification-and-clustering-of-twitter-data-for-business-analytics/ziya-fatima
System For Product Recommendation In E-Commerce ApplicationsIJERD Editor
This document summarizes a research paper that proposes a personalized hybrid recommendation system for e-commerce applications that can support massive datasets. The system uses clustering algorithms to build a user preference tree to model user interests. It then uses map-reduce on Hadoop to accelerate the recommendation algorithm using user and product similarity matrices in order to provide recommendations to users in an online mode quickly despite large, unstructured data. The performance of the map-reduce based system is analyzed and shown to have advantages over traditional centralized methods for large datasets.
Improving Service Recommendation Method on Map reduce by User Preferences and...paperpublications3
Abstract: Service recommender systems have been shown as valuable tools for providing appropriate recommendations to users. In the last decade, the amount of customers, services and online information has grown rapidly, yielding the big data analysis problem for service recommender systems. Consequently, traditional service recommender systems often suffer from scalability and inefficiency problems most of existing service recommender systems present the same ratings and rankings of services to different users without considering diverse users' preferences, and therefore fails to meet users' personalized requirements. In this paper, to address the above challenges and presenting a personalized service recommendation list and recommending the most appropriate services to the users effectively. Specifically, keywords are used to indicate users' preferences, and a user-based Collaborative filtering algorithm is adopted to generate appropriate recommendations.Keywords: recommender system, user preference, keyword, Big Data, mapreduce, Hadoop.
Title: Improving Service Recommendation Method on Map reduce by User Preferences and Reviews
Author: Dayanand Bhovi, Mr. Ashwin Kumar
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
This document describes a system to predict customer purchase intention from social media posts like tweets. The system was developed using a dataset of 3,200 manually annotated tweets relating to the iPhone X. Various machine learning models were tested on their ability to classify tweets as indicating purchase intention or not. The models were evaluated based on accuracy, precision, recall, and F-measure. The best performing models were logistic regression with a binary document vector, achieving an accuracy of 80.8%, and SVM with a TF document vector, achieving 80.5% accuracy. The system aims to help companies better target advertising to potential customers based on analysis of their social media data.
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...IRJET Journal
This document proposes using Apache Hadoop and a data-aware cache framework called Dache to analyze large amounts of social media data from Twitter in real-time. The goals are to overcome limitations of existing analytics tools by leveraging Hadoop's ability to handle big data, improve processing speed through Dache caching, and provide visualizations of trends. Data would be grabbed from Twitter using Flume, stored in HDFS, converted to CSV format using MapReduce, analyzed using Dache to optimize Hadoop jobs, and visualized using tools like Tableau. The system aims to efficiently analyze social media trends at low cost using open source tools.
The software development process is complete for computer project analysis, and it is important to the evaluation of the random project. These practice guidelines are for those who manage big-data and big-data analytics projects or are responsible for the use of data analytics solutions. They are also intended for business leaders and program leaders that are responsible for developing agency capability in the area of big data and big data analytics .
For those agencies currently not using big data or big data analytics, this document may assist strategic planners, business teams and data analysts to consider the value of big data to the current and future programs.
This document is also of relevance to those in industry, research and academia who can work as partners with government on big data analytics projects.
Technical APS personnel who manage big data and/or do big data analytics are invited to join the Data Analytics Centre of Excellence Community of Practice to share information of technical aspects of big data and big data analytics, including achieving best practice with modeling and related requirements. To join the community, send an email to the Data Analytics Centre of Excellence
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Love Arora
Big data came into existence when the traditional relational database systems were not able to handle the unstructured data (weblogs, videos, photos, social updates, human behaviour) generated today by organisation, social media, or from any other data generating source. Data that is so large in volume, so diverse in variety or moving with such velocity is called Big data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Apache Hive, No SQL and HPCC. These technologies handle massive amount of data in MB, PB, YB, ZB, KB, and TB.
In this research paper various technologies for handling big data along with the advantages and disadvantages of each technology for catering the problems in hand to deal the massive data has discussed.
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET Journal
This document discusses a proposed system for categorizing search engine results using conceptual clustering. The system analyzes the content of search results to extract relevant concepts, then uses a personalized conceptual clustering algorithm to generate a decision tree of query clusters. This tree can be used to identify categories for web pages and provide topically relevant results to users. The system aims to improve on traditional ranked search results by categorizing results based on the conceptual preferences and interests of individual users.
This document describes a project to classify YouTube videos into categories using machine learning models trained on video metadata. The authors analyzed 240,000 YouTube video records containing metadata fields like title, description, view count, etc. They preprocessed the data, engineered new features, and used text analysis and decision tree classifiers to predict the video category with around 67% accuracy based on the metadata. Key steps included text preprocessing, converting categorical fields into binary variables, and extracting additional features from fields like publish date and duration. The goal of categorization is to improve YouTube search, recommendations, and the overall user experience.
YouTube Title Prediction Using Sentiment AnalysisIRJET Journal
This document describes a sentiment analysis algorithm that analyzes YouTube comments to predict whether they are positive or negative. It uses linear regression to evaluate comments and categorize videos based on the sentiment analysis results. Graphs are generated to help YouTube creators understand feedback on their videos. The algorithm also generates new video title suggestions for creators by analyzing titles from similar videos on related topics. It uses keyword extraction and a transformer neural network to produce the new titles.
This dashboard analyzes trending videos on YouTube in India to understand factors that influence videos to appear in the trending section. The dashboard collects data from the YouTube API, cleans it, and stores it in a database. Visualizations including scatter plots, line plots, pie charts, bar charts and histograms are generated from the data to show trends like popular publishing hours, videos by day of the week, title formatting, top channels, and title lengths. The dashboard is deployed on Heroku so it is publicly available for creators to analyze trends and optimize their content.
This document describes a social media content analysis project that analyzes data from platforms like Facebook, Instagram, Twitter, and YouTube. The project is a web application that allows business users, influencers, and personal users to view analytics on their social media profiles. It features post scheduling, campaign generation, sentiment analysis of comments, and hashtag suggestion. The system architecture includes modules for analyzing metrics like followers/likes, post scheduling, campaign generation, sentiment analysis using Naive Bayes or SVM, and hashtag suggestion using keywords from images. The project aims to help users monitor their social media growth and enhance their profiles through automated posting and campaign tools.
Analysis and Prediction of Sentiments for Cricket Tweets using HadoopIRJET Journal
This document describes a study that analyzed and predicted sentiments for cricket tweets using Hadoop. The researchers collected tweets related to cricket matches from Twitter and analyzed them using an unsupervised machine learning method to predict the winning and losing teams based on the polarity (positive or negative sentiment) of the tweets. Hadoop was used as it can process large volumes of unstructured social media data in a distributed manner. The study aims to develop an intelligent system to accurately analyze sentiments from cricket tweets and predict match outcomes prior to games being played.
IRJET- Opinion Mining on Pulwama AttackIRJET Journal
This document discusses analyzing sentiment from tweets about the 2019 Pulwama attack in India. It proposes collecting tweets using Apache Nifi, storing them in HDFS, preprocessing the data with Hive, and analyzing sentiment using Tableau. The methodology involves fetching tweets via Twitter API, uploading to HDFS with Nifi, structuring the unstructured JSON data with Hive, removing null values with preprocessing, and visualizing sentiment analysis results in Tableau. Tools like Hadoop, MapReduce, HDFS, Hive, Nifi and Tableau are discussed for handling big Twitter data and performing sentiment analysis in order to understand public opinions expressed on social media.
This document provides a summary of a major project on analyzing YouTube trending videos using a database management system. It includes sections on acknowledgements, an abstract, introduction, scope, input design, database design, and coding. The abstract indicates the project involves collecting time-series data on over 8,000 YouTube videos over 9 months to analyze trending videos across their lifecycle and identify characteristics of trending videos and channels. The introduction discusses the importance of understanding trending videos due to their potential to become popular. The scope describes preliminary data analysis and processing of the dataset. The input, database, and coding sections outline the input requirements, proposed database tables to store video and user data, and use of SQL.
MapReduce allows distributed processing of large datasets across clusters of computers. It works by splitting the input data into independent chunks which are processed by the map function in parallel. The map function produces intermediate key-value pairs which are grouped by the reduce function to form the output data. Fault tolerance is achieved through replication of data across nodes and re-executing failed tasks. This makes MapReduce suitable for efficiently processing very large datasets in a distributed environment.
This document presents a proposed system called "One Stop Recommendation" that aims to provide movie and television show recommendations for multiple over-the-top (OTT) platforms like Netflix, Amazon Prime Video, and Hotstar. It would create a single dashboard with screens for each OTT platform. Data would be collected from sources like Kaggle and Google Forms. The system would use different recommendation techniques like content-based filtering, collaborative filtering, and cosine similarity to provide unified recommendations across platforms. It aims to help users more easily find content suggestions and gain insights from visualization of the recommendation data.
This document presents a proposed system called "One Stop Recommendation" that aims to provide movie and television show recommendations for multiple over-the-top (OTT) platforms like Netflix, Amazon Prime Video, and Hotstar. It would create a single dashboard with screens for each OTT platform. Data would be collected from sources like Kaggle and Google Forms. The system would use different recommendation techniques like content-based filtering, collaborative filtering, and cosine similarity to provide unified recommendations across platforms. It aims to help users more easily find content suggestions and gain insights from visualization of the recommendation data.
A Data Science Project using data mining techniques (N-Grams, TF-IDF text analytics, sentiment detection) combined with R and ggplot2 for exploratory data analysis to predict stock market trends based on world news events sourced from Reddit /r/worldnews using Decision Trees and SVM (Support Vector Machines) on KNIME. All experiments were done using public cloud infrastructure, running HIVE queries to prefilter data with HDInsights on Azure.
Detailed Investigation of Text Classification and Clustering of Twitter Data ...ijtsrd
As of late there has been a growth in data. This paper presents a methodology to investigate the text classification of data gathered from twitter. In this study sentiment analysis has been done on online comment data giving us picture of how to discover the demands of a people. Ziya Fatima | Er. Vandana "Detailed Investigation of Text Classification and Clustering of Twitter Data for Business Analytics" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38527.pdf Paper Url: https://www.ijtsrd.com/engineering/computer-engineering/38527/detailed-investigation-of-text-classification-and-clustering-of-twitter-data-for-business-analytics/ziya-fatima
System For Product Recommendation In E-Commerce ApplicationsIJERD Editor
This document summarizes a research paper that proposes a personalized hybrid recommendation system for e-commerce applications that can support massive datasets. The system uses clustering algorithms to build a user preference tree to model user interests. It then uses map-reduce on Hadoop to accelerate the recommendation algorithm using user and product similarity matrices in order to provide recommendations to users in an online mode quickly despite large, unstructured data. The performance of the map-reduce based system is analyzed and shown to have advantages over traditional centralized methods for large datasets.
Improving Service Recommendation Method on Map reduce by User Preferences and...paperpublications3
Abstract: Service recommender systems have been shown as valuable tools for providing appropriate recommendations to users. In the last decade, the amount of customers, services and online information has grown rapidly, yielding the big data analysis problem for service recommender systems. Consequently, traditional service recommender systems often suffer from scalability and inefficiency problems most of existing service recommender systems present the same ratings and rankings of services to different users without considering diverse users' preferences, and therefore fails to meet users' personalized requirements. In this paper, to address the above challenges and presenting a personalized service recommendation list and recommending the most appropriate services to the users effectively. Specifically, keywords are used to indicate users' preferences, and a user-based Collaborative filtering algorithm is adopted to generate appropriate recommendations.Keywords: recommender system, user preference, keyword, Big Data, mapreduce, Hadoop.
Title: Improving Service Recommendation Method on Map reduce by User Preferences and Reviews
Author: Dayanand Bhovi, Mr. Ashwin Kumar
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
This document describes a system to predict customer purchase intention from social media posts like tweets. The system was developed using a dataset of 3,200 manually annotated tweets relating to the iPhone X. Various machine learning models were tested on their ability to classify tweets as indicating purchase intention or not. The models were evaluated based on accuracy, precision, recall, and F-measure. The best performing models were logistic regression with a binary document vector, achieving an accuracy of 80.8%, and SVM with a TF document vector, achieving 80.5% accuracy. The system aims to help companies better target advertising to potential customers based on analysis of their social media data.
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...IRJET Journal
This document proposes using Apache Hadoop and a data-aware cache framework called Dache to analyze large amounts of social media data from Twitter in real-time. The goals are to overcome limitations of existing analytics tools by leveraging Hadoop's ability to handle big data, improve processing speed through Dache caching, and provide visualizations of trends. Data would be grabbed from Twitter using Flume, stored in HDFS, converted to CSV format using MapReduce, analyzed using Dache to optimize Hadoop jobs, and visualized using tools like Tableau. The system aims to efficiently analyze social media trends at low cost using open source tools.
The software development process is complete for computer project analysis, and it is important to the evaluation of the random project. These practice guidelines are for those who manage big-data and big-data analytics projects or are responsible for the use of data analytics solutions. They are also intended for business leaders and program leaders that are responsible for developing agency capability in the area of big data and big data analytics .
For those agencies currently not using big data or big data analytics, this document may assist strategic planners, business teams and data analysts to consider the value of big data to the current and future programs.
This document is also of relevance to those in industry, research and academia who can work as partners with government on big data analytics projects.
Technical APS personnel who manage big data and/or do big data analytics are invited to join the Data Analytics Centre of Excellence Community of Practice to share information of technical aspects of big data and big data analytics, including achieving best practice with modeling and related requirements. To join the community, send an email to the Data Analytics Centre of Excellence
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Love Arora
Big data came into existence when the traditional relational database systems were not able to handle the unstructured data (weblogs, videos, photos, social updates, human behaviour) generated today by organisation, social media, or from any other data generating source. Data that is so large in volume, so diverse in variety or moving with such velocity is called Big data. Analyzing Big Data is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. The technologies used by big data application to handle the massive data are Hadoop, Map Reduce, Apache Hive, No SQL and HPCC. These technologies handle massive amount of data in MB, PB, YB, ZB, KB, and TB.
In this research paper various technologies for handling big data along with the advantages and disadvantages of each technology for catering the problems in hand to deal the massive data has discussed.
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET Journal
This document discusses a proposed system for categorizing search engine results using conceptual clustering. The system analyzes the content of search results to extract relevant concepts, then uses a personalized conceptual clustering algorithm to generate a decision tree of query clusters. This tree can be used to identify categories for web pages and provide topically relevant results to users. The system aims to improve on traditional ranked search results by categorizing results based on the conceptual preferences and interests of individual users.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
1. Analysis of Trending Videos Pattern on YouTube
using Hadoop MapReduce
Pooja Kumar
MSc Data Analytics
National College of Ireland
Dublin, Ireland
x18181929@student.ncirl.ie
Abstract— YouTube is one of the prominent sites for video
hosting and sharing. To enhance the content and quality of the
video, analysis of user interaction factor and streaming
information of videos is required. Video content can be
scrutinized using factors like number of views, likes, comments
and dislikes. Based on this the channel which is more active and
in which the user has more interest can be identified. This
project aims to utilize the data which has user interaction
factors and information about videos to analyze the trending
pattern. The large dataset of YouTube can be managed in
Hadoop. Data is stored in the Hadoop Distributed File System
(HDFS) and processed using the MapReduce framework. Data
was analyzed to distinguish the content and discover which has
a better response, which helps to organize the videos in the
future. Further, the analysis helps in understanding the popular
category, popular channel, the video which has a high number
of views, which category of videos is most disliked.
Keywords— Hadoop, Hadoop Distributed File System
(HDFS), MapReduce, YouTube, Mapper, Reducer.
I. INTRODUCTION
Social networking sites offer a lucrative way to promote
and advertise. They offer various opportunities for product
publicity, showcasing new movie trailers and images,
corporate promotions [2]. Beyond YouTube's clear
entertainment and leisure advantages, it is also being
leveraged for professional and other industry gains [3]. Most
companies are releasing previews of their new products on
YouTube for greater exposure. YouTube helps uploaders to
have reviews, likes and shares as the feedback on their videos.
The feedback is given by the viewers. By using this
information future improvements can be done. This will help
to customize the product according to the viewer needs. Each
minute, over 500 hours of video are uploaded to YouTube.
YouTube has more than 2 billion users. On YouTube daily, an
average of 1 billion hours of video is watched. Based on
10,000 reviewers around 32 million inappropriate videos were
removed in 2018. YouTube can be browsed over 80 languages
[7]. In the United States (U.S), YouTube reaches more young
people on mobile than any Television Network. Around “73%
of U.S adults use YouTube”. Figure.1 shows how many
internet users in the U.S use YouTube. Nearly “15% of site
traffics on YouTube comes from the U.S”. Next, it is “India
with 8.1% and Japan with 4.6%”. YouTube will generate $5.5
billion in advertising revenues in the US alone by 2020 [8].
In this paper Hadoop, a big data framework was used for
handling data. It is a Java-based programming framework.
MapReduce programming model is implemented for
processing massive amounts of data in parallel. Data storage,
accessing, processing and computation with a file system is
provided in Hadoop Distributed File System (HDFS). U.S
YouTube data was analyzed using HDFS and MapReduce
framework to provide a better understanding of viewer
interest.
Fig. 1. U.S YouTube Users
Research Question: How Hadoop MapReduce framework
help to analyze the pattern of trending videos on YouTube
and user’s interest that can be used to benefit people by
making proper decision to advance on trending topics?
The research question is answered based on analyzing some
topics using MapReduce.
Q1) How videos are distributed and in which category there
are more videos?
Q2) Which channel has more views?
Q3) Which video is disliked by most of the users?
Q4) Which channel has more videos and to which category
they belong?
Q5) Which video has been trending from many days?
Q6) How many videos were removed from each category?
Q7) For how many videos in each channel the rating is
disabled?
Q8) How many videos in a channel are not associated with a
particular category?
Q9) How well are the factors views, likes and comments
count are correlated to each other? (Statistical approach using
python)
In this project trending topics and people’s interest can be
analyzed. This information can be used to benefit people by
making a proper decision to advance on trending topics. This
2. analysis helps the uploader to find more about the
competitors on YouTube. It discovers the videos that perform
best. Based on videos uploaded and viewer interaction factor,
it detects the active channel and category. Using this method,
it is easy to find a number of videos in each category, the
number of viewers in each channel, videos that viewers don’t
like.
II. RELATED WORK
The data transmitted was in limited size when
communication was in traditional form. Now, due to
widespread usage of internet, the massive amount of data is
possessed by companies and social media platforms. As data
is massive to analyze and extract information from the data,
analytics is used. The critical data analysis has resulted in
advanced analytical intelligence for various data segments
and to forecast future predictions. The huge and complex data
cannot be handled by the design of conventional data analysis
software. The result cannot be generated for the accumulated
data due to complexity. Thus [Biju and Mathew, 2017],
explained the need for Big Data analytics [6]. Because of the
availability of Big Data and low-cost hardware product, there
was a requirement to process data quickly and cost-
effectively [3].
Hadoop MapReduce framework splits large framework
into smaller parallel chunks and manages the scheduling.
Mapping is done for each piece to an intermediate value and
then it is reduced as a result of an analysis. MapReduce
algorithm can be rewritten according to the problem and can
be broken into chunks to be solved in parallel. This is how it
addresses large datasets with a distributed solving method
[5]. “YARN (Yet Another Resource Negotiator) separates
resource management functions from programming model”.
On top of YARN, MapReduce is an application that will be
running. Resource management and job scheduling are split
into separate daemons in JobTracker’s application. The
author explained how Hadoop MapReduce and YARN works
[5].
In the existing situation, YouTube analytics can be used
by uploaders to analyze their own channel. This analysis
provides the overall parameters and it is available for their
own channel. Competitor’s information is not revealed [3].
The author [Harikumar, Kapoor and Waghmare, 2019]
has analyzed YouTube data. A technique was implemented
to detect the sensitive text in a comment to identify the
popular content type which helped both YouTubers and
advertising companies to upload the videos based on
popularity. Here, the first data was collected and the sensitive
contents in comments were substituted further it was
analyzed for popular type content. Based on this analysis
YouTubers advertise the product from which Ad revenue was
earned [5].
The author [Dabas, Kaur, Gulati and Tilak, 2019] has
presented the classification and analysis of YouTube video
comments based on the Hadoop application. Using Hive
queries the comments was been summarized. Sentiment
analysis was performed on comments using python. Thus, the
system delivered a promising result for queries in terms of
execution time [2].
III. METHODOLOGY
This study focuses on generating a pattern using the
Hadoop environment for user interaction on YouTube. For
this research two datasets were considered, videos trending
on YouTube and categories of video-specific for each region
were downloaded from Kaggle [9].
The YouTube dataset has a daily record of trending
videos of several months. It includes the data of many
countries in a separate file. U.S YouTube data file was chosen
for the analysis. It contains 40949 records and has
information such as video id, trending date, title, channel,
category id, publish time, tags, views, likes, dislikes,
comment count, thumbnail link, comment disabled
information, ratings disabled information, information about
the error or removed video and description. The extracted file
was in comma separated values(csv) format and required less
pre-processing as there were no missing or NAN values. But
all the commas were removed as they act as delimiter and the
date was formatted into the proper format i.e, dd/mm/yyyy.
The category dataset is different for each region this
information was fetched from associated JSON file from
Kaggle [9]. This JSON file includes information like etag,
category id, snippet-channel id, snippet-title, snippet-
assignable. The JSON file was converted to structured data
i.e, to CSV format using python code. There was no need of
pre-processing this dataset as it contained all the information.
Both datasets were merged using python code. While
merging, only required information was taken into
consideration and the column which was not used in the
analysis was removed. After merging there were 40949
records in the dataset. Figure.2 shows the information about
the merged dataset. The combined dataset was loaded into
HDFS. Java MapReduce framework was utilized for all
insights in the entire process. After the generation of result
visualization was done. By analyzing this dataset, patterns
about the trending videos can be drawn and it would be useful
for the YouTubers. This helps to know which kind of
category or video has a better response based on that
YouTubers can upload the contents.
Fig. 2. Dataset after merging
3. IV. IMPLEMENTATION
The architecture was designed for implementation and
followed to perform the analysis on YouTube data. The
required processing was done on the dataset and all the
essential records from the dataset were merged and loaded into
HDFS. Using the MapReduce program, the data was read
from the HDFS. Mapping and reducing operations are
performed and the generated output was stored into HDFS.
Java programming language was used to write code. Then the
generated output was visualized. Figure.3. shows the process
flow of Data Analysis.
Fig. 3. Process Flow of Data Analysis
To perform the MapReduce task, three java classes will be
created Mapper, Reducer and Driver class. In mapper class
input will be processed and intermediate key-value will be
generated by splitting input and recording it in the form of
(key, value) pair. The output from the mapper is fed as input
to reducer class. For a wide range of processing, data can be
aggregated, filtered and combined in several ways in reducer
to generate the output. The Driver class is accountable for
setting MapReduce job to run in Hadoop. Job name, the data
type for input and output, class names of mapper and reducer
are specified in driver class. During Java, program
compilation directory will be created in the current directory
with a package name mentioned in a java source file and all
compiled files will put into it. Also, the jar file has to be
created. The input was read from HDFS and the generated
output was stored back into HDFS. This is similar to all the
queries.
Q1) The first task is to find the distribution of videos based
on category. To perform this task, three java classes were
created categoryMapper, categoryReducer and driver class.
Mapper class is as seen in figure 4, a variable is declared that
stores all the lines from the input file. Using the delimiter, the
line is split and values are stored as an array. The column
which has category information is read and mapped as key.
For each key, the value is stored as 1. It is an intermediate key-
value pair output generated from the mapper class. In Reducer
class the output from mapper is read as an input. By shuffling
and reducing, the output is generated. Figure 5. shows the
reducer java class. Figure. 6 is the driver class in which the job
name as YouTube, the data type for output as (Text,
IntWritable) and name of mapper and reducer class are
specified to run the program. This program was compiled.
Using JAR command all the classes are added to the JAR file
shown in figure.7 and the program is executed. The output is
stored in HDFS. This task was performed to know how videos
are distributed.
Fig. 4. Mapper class
Fig. 5. Reducer class
Fig. 6. Driver class
Fig. 7. JAR file for Task 1
Q2) The second task is to identify which channel has more
views. In mapper class, channel and number of views
associated with the particular channel are sorted and mapped
as key-value pair. In reducer, the key-value pair was shuffled
and reduced. All the java classes were added to the JAR file
to perform the execution. After execution output was stored
into HDFS.
Q3) The third task is to identify the video which is most
disliked. In mapper, the video ID associated with each video
and the number of dislikes was taken into consideration.
Video ID and dislike count was sorted and intermediate key-
value pair was generated. It was sent to the reducer to perform
shuffling and reducing task. An operation was performed to
count the number of dislikes that were given to the video.
Based on this output the video which most of the viewers do
not like can be identified.
Q4) The fourth task is to analyze category, the channel
which has more videos. In mapper, category and channel ID
4. which is the name of the channel was grouped to form a key.
For every key, the value was set and this key-value pair was
sent to the reducer. In reducer for each occurrence of the same
channel in a category, the value was counted. Thus, a
particular channel in a category that has more video was
fetched as output.
Q5) The fifth task is to identify which video has been
trending for many days on YouTube. For each video, there is
a record of publishing date and trending date using these the
information the number of trending days of a video can be
fetched in a mapper class. The output from the mapper was
sent as input to the reducer and final reduced output was
generated.
Q6) The sixth task is to check how many videos were
removed from each category. In mapper, category and video
removed which is binary information that is presented in
column video_error_or_removed was used. When the binary
is False then the corresponding category name was set as key
and value was stored as 1. Thus, the intermediate key-value
pair was generated. In reducer, the input from mapper was
reduced and the final output of the key-value pair was
generated.
Q7) The seventh task is to identify how many videos in a
channel the rating is disabled. This is similar to the above task
the rating disabled is binary information. In mapper, each
time the rating disabled is as True value the corresponding
channel name is set as key and value is set as 1. Thus, in
reducer class each time the channel name is repeated the
value is added and the output was generated.
Q8) The eighth task is to find how many videos in the
channel are not associated with the category. In this snippet
assignable column has binary value every time the value is
false the corresponding channel and category were set as key
and value was set in mapper class. In reducer class, every
time the channel name and category were the same the count
of value was increased by 1 and thus the reducer output was
generated.
JAR file information of all the JAVA programs is as seen
in the figure. 8.
Q9) The final task was to analyze the correlation between
views, likes and comment count. This was done using python.
If views, likes and comment counts are highly correlated then
it suggests the viewer’s interest on a particular video.
V. RESULTS
Q1) As seen in the figure.10, the entertainment and music
categories have more videos compared to all others. In this
graph, the distribution of videos for each category can be
seen. It is evident that YouTubers upload more videos to
entertainment and music categories. Figure. 11, shows the
distribution of videos based on number of times category
name appears the text becomes bigger and bolder in word
cloud.
Fig. 8. JAR file of all Tasks
Fig. 9. Output for Task1
Fig. 10. Videos in each category
Fig. 11. Distribution of videos
Q2) As Figure 12. shows the top 20 channels which have
more views. It can be analyzed that these 20 channels viewed
more on YouTube. Most of the channels here belong to
music, entertainment and sports categories.
5. Fig.12. Top 20 channel with more views
Q3) In figure 13. the video ID of the 20 videos that are
most disliked is shown. Using this video ID corresponding
channel to which the video belongs can be identified. Here,
ID- FlsCjmMhFmw belongs to the YouTube-Spotlight
channel and this channel is the most disliked channel by the
viewers.
Fig.13. Most Disliked videos
Q4) The result shown in figure 14. says which channel
belonging to which category has more videos. It is observed
that in the sports category the ESPN channel has more video
when compared to all other channels.
Fig.14. Channel which has more number of Videos
Q5) In figure 15. it is displaying the top 10 videos that
have been trending for many days on YouTube. In this, the
two videos which are trending for many days belong to the
Sports and News & Politics category.
Fig.15. Top 10 shows trending from many days
Q6) In figure 16. it is seen that some of the videos which
belong to entertainment, Film & Animation and Sports
Category. The videos belong to other categories were not
removed.
Fig.16. Categories in which more videos are removed
Q7) The graph in figure 17. gives information about for
how many videos in a channel the rating has been disabled.
The greater number of videos for which the rating has
disabled belong to How To & Style category.
Fig.17. Videos Streaming for many days
Q8) From figure 18. it is observed that the videos
belonging to CNET and Bleacher Report channel are not
associated with their category which is Shows. This means
the snippet is not assignable.
6. Fig.18. Number of videos snippet not assignable
Q9) From the result in figure 19. It is observed that views
and likes are highly correlated to each other compared to
comment count. Based on this correlation matrix viewer’s
interest can be analyzed. If video has more views then it has
a high chance of being liked by many viewers.
Fig.19. Correlation Matrix
VI. CONCLUSION AND FUTURE ENHANCEMENT
In this paper, an analysis was made on YouTube data
using the Hadoop MapReduce framework. The research
question was answered by using MapReduce tasks. Through
this analysis, it was found that entertainment and music
categories have more videos and have a greater number of
views. The shows which belongs to sports, news & politics
category have been trending for many days. From these tasks
the video trending, pattern and user interest were analyzed.
The viewer’s interest can be identified based on views, likes
and comments. The project results also highlight the
advantages of the Hadoop framework and its disadvantage is
syntax complexity of Java MapReduce.
In the future, analysis can be made using uploader’s
information and also sentiment analysis can be done on a
description of the video.
REFERENCES
[1] P. Merla, Y.Liang “Data Analysis using Hadoop MapReduce
Environment”, IEEE Conf. on Big Data, 2017 [Online]. Available:
IEEE Xplore, https://www.ieee.org/ [Accessed on: Apr. 10, 2020].
[2] C. Dabas, P. Kaur, N. Gulati and M.Tilak, “Analysis of Comments on
YouTube Videos using Hadoop”, Fifth Internation Conf. on Image
Information Processing (ICIIP), 2019.
[3] F. Shaikh, D. Pawaskar, U. Khan and A.Siddiqui, “YouTube Data
Analysis using MapReduce on Hadoop”, IEEE Conf. on Recent Trends
in Electronics, Information & Communication Technology, May.
18/19, 2018.
[4] K. Bhatter, S. Gavhane, P. Dhamne, G. Aochar and S. Rabade, “A
Review on YouTube Data Analysis using MapReduce on Hadoop”,
International Journal of Research in Engineering, Science and
Management, vol. 2, no. 12, Dec., 2019 [Online]. Available:
https://www.ijresm.com/ [Accessed on Apr. 10, 2020].
[5] D. Harikumar, D. Kapoor and S. Waghmare, “YouTube Data
Sensitivity and Analysis using Hadoop Framework”, International
Research Journal of Engineering and Technology, vol. 6, no. 4, 2019
[Online]. Available: https://www.irjet.net/ [Accessed on Apr. 10,
2020].
[6] S. Biju and A. Mathew, “Comparative Analysis of Selected Big Data
Analytics Tools”, Journal of International Technology and Information
Management, vol. 26, no. 2, 2017.
[7] FactSlides.com, “25 Facts about YouTube FactSlides”, Nov. 28, 2019
[Online]. Available: https://www.factslides.com/s-YouTube/
[Accessed on Apr. 10, 2020].
[8] P. Cooper, Hootsuite Social Media Management, “23 YouTube
Statistics that Matter to Marketers in 2020”, Dec. 17, 2019 [Online].
Available: https://blog.hootsuite.com/youtube-stats-marketers/
[Accessed on Apr. 10, 2020].
[9] M. J, “Trending YouTube Video Statistics”, Kaggle.com, 2019
[Online]. Available: https://www.kaggle.com/datasnaek/youtube-new/
[Accessed: Apr. 10, 2020].