SlideShare a Scribd company logo
1 of 15
Major project documentation
“YouTube trending video analysis DBMS”
is submitted to
Department of Computer Applications,
Submitted To:
Submitted By:
Project Undertaken:
Acknowledgement
The satisfaction that accompanies that the successful completion of any task
would be incomplete without the mention of people whose ceaseless
cooperation made it possible, whose constant guidance and encouragement
crown all efforts with success. We are grateful to our project guide “Mr. Shakti
kundu” for the guidance, inspiration and constructive suggestions that helpful
us in the preparation of this project.
We are also thankful to my colleagues with whom we have
fruitful discussions which have helped us a lot in giving a final shape to the
program.
ABSTRACT
Unlike popular videos, which would have already achieved
high viewership numbers by the time they are declared
popular, YouTube trending videos represent content that
targets viewers’ attention over a relatively short time, and
has the potential of becoming popular. Despite their
importance and visibility, YouTube trending videos have
not been studied or analyzed thoroughly. In this paper, we
present our findings for measuring, analyzing, and
comparing key aspects of YouTube trending videos. Our
study is based on collecting and monitoring high-resolution
time-series of the viewership and related statistics of more
than 8,000 YouTube videos over an aggregate period of
nine months. Since trending videos are declared as such
just several hours after they are uploaded, we are able to
analyze trending videos’ time-series across critical and
sufficiently-long durations of their lifecycle. In addition,
we analyze the profile of users who upload trending videos,
to potentially identify the role that these users’ profile plays
in getting their uploaded videos trending. Furthermore, we
conduct a directional-relationship analysis among all pairs
of trending videos’ time-series that we have monitored. We
employ Granger Causality (GC) with significance testing to
conduct this analysis. Unlike traditional correlation
measures, our directional-relationship analysis provides a
deeper insight onto the viewership pattern of different
categories of trending videos. Our findings include the
following. Trending videos and their channels have clear
distinct statistical attributes when compared to other
YouTube content that has not been labeled as trending.
Based on the GC measure, the viewership of nearly all
trending videos has some level of directional-relationship
with other trending videos in our dataset. Our results also
reveal a highly asymmetric directional-relationship among
different categories of trending videos. Our directionality
analysis also shows a clear pattern of viewership toward
popular categories, whereas some categories tend to be
isolated with little evidence of transitions among them.
Introduction
YouTube as a user generated content is one of the largest
and most popular video sharing websites. It hosts over four
billion views a day. YouTube provides public statistics
regarding its uploaded videos, most notably the number of
views, which shows the aggregate number of times a video
has been watched up to that point. Naturally, the number
of views for a video indicates the level of popularity of that
video; and it takes a varying amount of time for a video to
become popular (if it becomes popular). Meanwhile, there
relatively short time. YouTube also supports a feature
called trending, which represents content that has the
potential of becoming popular in a relatively short time.
Consequently, although trending videos are usually not
popular (yet) when declared as trending by YouTube, they
have the potential of becoming popular (eventually). For
example, some videos are labeled trending while having
only few hundreds in viewership numbers. From another
perspective, through trending videos, YouTube tries to
highlight emerging trends developing within different
viewership communities.
Meanwhile, the general attributes of the viewership of
trending videos have not been studied thoroughly. To the
best of our knowledge, basic statistics about YouTube
trending videos have not been studied, analyzed, or even
received any adequate attention. Considering the fact that
more than one billion unique users visit YouTube each
month and they upload 72 hours of video every minute
[26], YouTube is the best place for e.g. brand engagement
or advertising, but it is genuinely difficult and competitive
to get the attention of users. Therefore when a video
becomes popular, it is exposed to millions of users for free
and has the opportunity of keeping their attention for a
while. Finding these trends are significantly important that
many different websites have been emerged just to pick up
YouTube for content owners or advertisers. Better
understanding of YouTube trending videos and their
statistics, and a deeper insight about their lifecycles, can
greatly affect the strategies for marketing, target
advertising, recommendation systems and search engines,
as was suggested by prior YouTube measurement studies
[2]. This represents a key motivation for our effort
Scope
Our aim is to produce a scientific knowledge preprocessing analysis operating
solely with the
dataset US Videos. This step is important for all data processing exercises and
that we wish to
emphasize it. Before building theories from knowledge we'd like to grasp
key knowledge
attributes, like missing values, distinctive counts, outliers, and time-series
trends. This kernel
aims to function a tutorial to anyone fascinated by exploiting huge datasets. I
focus only on the
US videos dataset that isn't too huge by big-data standards (only twenty three,
362 rows by
sixteen columns as of March, 2018). This knowledge set contains solely YouTube
data and no
data that area unit troublesome to method and store, like video, image, audio,
or giant text
documents. Still we are going to proceed with knowledge preprocessing and
preliminary
knowledge Analysis (EDA) as if this were a very huge dataset, using techniques
that might be
utilized in rather more difficult knowledge manning exercises. We have a
tendency to worker
variety of techniques from the Scikit/Learn toolkit to administer aspiring to the
info at hand.
INPUT DESIGN
Input design is the process of converting user-oriented input to a computer based format.
Input
design is a part of overall system design, which requires very careful attention. Often the
collection of input
data is the most expensive part of the system. The main objectives of the input design are …
1. Produce cost effective method of input
2. Achieve highest possible level of accuracy
3. Ensure that the input is acceptable to and understood by the staff.
INPUT DATA:
The goal of designing input data is to make entry easy, logical and free from errors as
possible. The
entering data entry operators need to know the allocated space for each field; field sequence
and which must
match with that in the source document. The format in which the data fields are entered
should be given in
the input form. Here data entry is online; it makes use of processor that accepts commands
and data from the
operator through a key board. The input required is analyzed by the processor. It is then
accepted or rejected.
Input stages include the following processes
Data Recording
Data Transcription
Data Conversion
Data Verification
Data Control
Data Transmission
Data Correction
One of the aims of the system analyst must be to select data capture method and devices,
which
reduce the number of stages so as to reduce both the changes of errors and the cost. Input
types, can be
characterized as.
External
Internal
Operational
Computerized
Interactive
Input files can exist in document form before being input to the computer. Input design is
rather
complex since it involves procedures for capturing data as well as inputting it to the
computer.
Trending
Channel_Title No of videos subscribers Company
Primary key : channel_title is a primary key because it is unique key in
which we can obtain all information through this single key.
Candidate key : No candidate key because except A.id no one attribute
can give information about the Airplane entity.
Foreign Key : No foreign key present in this entity.
Coding
DATABASE DESIGN
There are two choices to define the database schema. These are SQL and
NoSQL. We can use traditional database management system like MsSQL
or MySQL to keep data. As you know, we should keep information about
videos and users into RDBMS. Other information about videos, called
metadata, should be kept too. Now we have the main three tables to keep
data. (Notice that we just only think the basic properties of Youtube. We
can forget the recommendation system).
User
– UserID (primary key)
– Name (nvarchar)
– Age (Integer)
– Email (nvarchar)
– Address (nvarchar)
– Register Date (DateTime)
– Last Login (DateTime)
SQL is a language used for managing data in relational databases
that store data in tabular form with labelled rows and columns. We
query data from a relational database with the select statement of
SQL. The select statement is highly versatile and flexible in terms of
data transformation and filtering operations.
In that sense, SQL can be considered as a data analysis tool. The
advantage of using SQL for data transformation and filtering is that
we only retrieve the data we need. It is more practical and efficient
than retrieving all the data and then applying these operations.
In this article, we will use SQL statements and functions to analyze
YouTube trending video statistics. The dataset is available on
Kaggle. I created an SQL table that contains a small part of this
dataset.
Note: I’m using MySQL as the database management system.
Although SQL syntax is mostly the same for all database
management systems, there might be small differences.
The table is called “trending” and it has the following structure.
trending table
We have the dates a video is published and becomes trending. We
also have the title and channel of the video. The views and likes are
the other two features the dataset contains.
Regarding all these features (i.e. columns) we can do a bunch of
different operations. For instance, a simple one can be finding the
top 5 channels in terms of the number of trending videos.
mysql> select channel_title, count(*) as number_of_videos
-> from trending
-> group by channel_title
-> order by number_of_videos desc
-> limit 5;+-----------------+------------------+
| channel_title | number_of_videos |
+-----------------+------------------+
| Washington Post | 28 |
| Netflix | 28 |
| ESPN | 27 |
| TED-Ed | 27 |
| CNN | 27 |
+-----------------+------------------+
We select the channel title column and count the number of rows.
The “as” keyword is used to assign a new name to the aggregated
columns. The group by clause is used to group the videos (i.e. rows)
based on channels. Finally, we sort the results in descending order
using the order by clause and display the first 5.
The number of videos seems to be too low because I only included
the ones published in January, 2018.
We may want to see the title of the most-viewed video.
mysql> select title, views
-> from trending
-> where views = (select max(views) from trending);
(image by author)
The query above contains a nested select statement. It is used with
the where clause to find the desired condition which is the
maximum values in the views column.
The most-viewed video in this table has been watched almost 60
million times.
SQL provides many different options for filtering the data. In the
previous example, we found out that the most-viewed video belongs
to Bruno Mars. We can filter the titles to only see the videos belong
to Bruno Mars.
mysql> select distinct(title)
-> from trending
-> where title like "%Bruno Mars%";
(image by author)
We do not have to provide the exact value for filtering if we use the
like keyword. The “%” represents any character so “%Bruno Mars%”
represents any value that contains the “Bruno Mars” phrase. The
distinct keyword is used to remove the duplicates.
If we are not sure about characters being lower or uppercase, we can
convert all the characters to lower or upper case before filtering.
mysql> select distinct(lower(title))
-> from trending
-> where title like "%bruno mars%";
(image by author)
The dataset contains the published date of videos and when they
become trending. We can calculate the average time it takes for a
video to become trending.
Before calculating the difference, we need to extract the date part
from the publish time column because it contains both the date and
time.
mysql> select trending_date, publish_time
-> from trending
-> limit 3;
+---------------+---------------------+
| trending_date | publish_time |
+---------------+---------------------+
| 2018-01-02 | 2018-01-01 15:30:03 |
| 2018-01-02 | 2018-01-01 01:05:59 |
| 2018-01-02 | 2018-01-01 14:21:14 |
+---------------+---------------------+
The date function extracts the date part and the datediff function
calculates the difference. Thus, we can calculate the average
difference as follows:
mysql> select avg(datediff(trending_date, date(publish_time)))
-> as avg_diff
-> from trending;+----------+
| avg_diff |
+----------+
| 3.9221 |
+----------+
The datediff functions takes two dates separated by a comma and
calculates the difference. It takes 3.92 days on average for a video to
become trending.
We can also calculate the average difference for videos that are
published in a specific time period. We just need to add a where
clause for filtering.
mysql> select avg(datediff(trending_date, date(publish_time)))
as avg_diff
-> from trending
-> where hour(publish_time) > 20;+----------+
| avg_diff |
+----------+
| 4.4825 |
+----------+
We extract the hour value from publish time and use it in the where
clause for filtering.
SQL provides functions for data aggregation which can be
implemented in the select statement. For instance, we can calculate
the average ratio of likes over views of videos published by Netflix.
mysql> select avg(likes / views)
-> from trending
-> where channel_title = "Netflix";+--------------------+
| avg(likes / views) |
+--------------------+
| 0.01816295 |
+--------------------+
The average value is close to 0.02 so Netflix videos have
approximately 2 percent like over view ratio.
Let’s write a slightly more complicated query and calculate the
average video views of channels that published more than 25 videos.
We will also sort the results in descending order by the averages.
mysql> select channel_title, avg(views) as avg_views,
-> count(title) as number_of_videos
-> from trending
-> group by channel_title
-> having number_of_videos > 25
-> order by avg_views desc;
The retrieved data contains 3 columns. One is the channel title
column and the other two are aggregated columns. We filter the
channels based on the number of videos.
You may have noticed that we used the “having” clause instead of
the “where” clause for filtering. The “having” clause is used for
filtering based on aggregated columns.
ER Diagram
Conclusion
We have done some examples to analyse the YouTube trending
video statistics. The examples clearly demonstrate that SQL can also
be used as a data analysis tool.

More Related Content

Similar to youtube.docx

IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
 
Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...
Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...
Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...Hannah Flynn
 
How and Why: Embedded Analytics Interfaces For Your SaaS Product
How and Why: Embedded Analytics Interfaces For Your SaaS ProductHow and Why: Embedded Analytics Interfaces For Your SaaS Product
How and Why: Embedded Analytics Interfaces For Your SaaS ProductAggregage
 
Supercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX AnalyticsSupercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX AnalyticsUserZoom
 
Monitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIMonitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIChristian Buckley
 
AAA Keeps their JDE System and Files Humming with Data File Purge Processes
AAA Keeps their JDE System and Files Humming with Data File Purge ProcessesAAA Keeps their JDE System and Files Humming with Data File Purge Processes
AAA Keeps their JDE System and Files Humming with Data File Purge ProcessesTeamCain
 
Hivetree introduction (Mar 2014)
Hivetree introduction (Mar 2014)Hivetree introduction (Mar 2014)
Hivetree introduction (Mar 2014)HIVENEST
 
Public Sector Agility Accelerator
Public Sector Agility AcceleratorPublic Sector Agility Accelerator
Public Sector Agility AcceleratorCraig Smith
 
WS98-08-008
WS98-08-008WS98-08-008
WS98-08-008Duco Das
 
The Role of Data Analytics in Software Development.pdf
The Role of Data Analytics in Software Development.pdfThe Role of Data Analytics in Software Development.pdf
The Role of Data Analytics in Software Development.pdfBahaa Al Zubaidi
 
TV Show Popularity Prediction using Sentiment Analysis in Social Network
TV Show Popularity Prediction using Sentiment Analysis in Social NetworkTV Show Popularity Prediction using Sentiment Analysis in Social Network
TV Show Popularity Prediction using Sentiment Analysis in Social NetworkIRJET Journal
 
Beginners discussion to - Google Analytics
Beginners discussion to - Google Analytics Beginners discussion to - Google Analytics
Beginners discussion to - Google Analytics Lee Trevena
 
Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...
Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...
Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...Randall Helms
 
Mozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersMozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersJohn Schneider
 
Amnesty International Digital Product Roadmap
Amnesty International Digital Product RoadmapAmnesty International Digital Product Roadmap
Amnesty International Digital Product RoadmapAmnesty International UK
 
Simplifying Analytics - by Novoniel Deb
Simplifying Analytics - by Novoniel DebSimplifying Analytics - by Novoniel Deb
Simplifying Analytics - by Novoniel DebNovoniel Deb
 
Agility Accelerator
Agility AcceleratorAgility Accelerator
Agility AcceleratorCraig Smith
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with DatabricksGrega Kespret
 
Using sharepoint to solve business problems #spsnairobi2014
Using sharepoint to solve business problems #spsnairobi2014Using sharepoint to solve business problems #spsnairobi2014
Using sharepoint to solve business problems #spsnairobi2014Amos Wachanga
 
235429094 jobportal-documentation
235429094 jobportal-documentation235429094 jobportal-documentation
235429094 jobportal-documentationsireesha nimmagadda
 

Similar to youtube.docx (20)

IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
 
Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...
Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...
Modern Product Data Workflows: How and Why: Embedded Analytics Interfaces For...
 
How and Why: Embedded Analytics Interfaces For Your SaaS Product
How and Why: Embedded Analytics Interfaces For Your SaaS ProductHow and Why: Embedded Analytics Interfaces For Your SaaS Product
How and Why: Embedded Analytics Interfaces For Your SaaS Product
 
Supercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX AnalyticsSupercharge Your Corporate Dashboards With UX Analytics
Supercharge Your Corporate Dashboards With UX Analytics
 
Monitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIMonitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROI
 
AAA Keeps their JDE System and Files Humming with Data File Purge Processes
AAA Keeps their JDE System and Files Humming with Data File Purge ProcessesAAA Keeps their JDE System and Files Humming with Data File Purge Processes
AAA Keeps their JDE System and Files Humming with Data File Purge Processes
 
Hivetree introduction (Mar 2014)
Hivetree introduction (Mar 2014)Hivetree introduction (Mar 2014)
Hivetree introduction (Mar 2014)
 
Public Sector Agility Accelerator
Public Sector Agility AcceleratorPublic Sector Agility Accelerator
Public Sector Agility Accelerator
 
WS98-08-008
WS98-08-008WS98-08-008
WS98-08-008
 
The Role of Data Analytics in Software Development.pdf
The Role of Data Analytics in Software Development.pdfThe Role of Data Analytics in Software Development.pdf
The Role of Data Analytics in Software Development.pdf
 
TV Show Popularity Prediction using Sentiment Analysis in Social Network
TV Show Popularity Prediction using Sentiment Analysis in Social NetworkTV Show Popularity Prediction using Sentiment Analysis in Social Network
TV Show Popularity Prediction using Sentiment Analysis in Social Network
 
Beginners discussion to - Google Analytics
Beginners discussion to - Google Analytics Beginners discussion to - Google Analytics
Beginners discussion to - Google Analytics
 
Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...
Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...
Randall Helms - Video Tracking in Google Analytics: Lessons Learned and Trick...
 
Mozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersMozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineers
 
Amnesty International Digital Product Roadmap
Amnesty International Digital Product RoadmapAmnesty International Digital Product Roadmap
Amnesty International Digital Product Roadmap
 
Simplifying Analytics - by Novoniel Deb
Simplifying Analytics - by Novoniel DebSimplifying Analytics - by Novoniel Deb
Simplifying Analytics - by Novoniel Deb
 
Agility Accelerator
Agility AcceleratorAgility Accelerator
Agility Accelerator
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Using sharepoint to solve business problems #spsnairobi2014
Using sharepoint to solve business problems #spsnairobi2014Using sharepoint to solve business problems #spsnairobi2014
Using sharepoint to solve business problems #spsnairobi2014
 
235429094 jobportal-documentation
235429094 jobportal-documentation235429094 jobportal-documentation
235429094 jobportal-documentation
 

Recently uploaded

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

youtube.docx

  • 1. Major project documentation “YouTube trending video analysis DBMS” is submitted to Department of Computer Applications, Submitted To: Submitted By: Project Undertaken:
  • 2. Acknowledgement The satisfaction that accompanies that the successful completion of any task would be incomplete without the mention of people whose ceaseless cooperation made it possible, whose constant guidance and encouragement crown all efforts with success. We are grateful to our project guide “Mr. Shakti kundu” for the guidance, inspiration and constructive suggestions that helpful us in the preparation of this project. We are also thankful to my colleagues with whom we have fruitful discussions which have helped us a lot in giving a final shape to the program. ABSTRACT Unlike popular videos, which would have already achieved high viewership numbers by the time they are declared
  • 3. popular, YouTube trending videos represent content that targets viewers’ attention over a relatively short time, and has the potential of becoming popular. Despite their importance and visibility, YouTube trending videos have not been studied or analyzed thoroughly. In this paper, we present our findings for measuring, analyzing, and comparing key aspects of YouTube trending videos. Our study is based on collecting and monitoring high-resolution time-series of the viewership and related statistics of more than 8,000 YouTube videos over an aggregate period of nine months. Since trending videos are declared as such just several hours after they are uploaded, we are able to analyze trending videos’ time-series across critical and sufficiently-long durations of their lifecycle. In addition, we analyze the profile of users who upload trending videos, to potentially identify the role that these users’ profile plays in getting their uploaded videos trending. Furthermore, we conduct a directional-relationship analysis among all pairs of trending videos’ time-series that we have monitored. We employ Granger Causality (GC) with significance testing to conduct this analysis. Unlike traditional correlation measures, our directional-relationship analysis provides a deeper insight onto the viewership pattern of different categories of trending videos. Our findings include the following. Trending videos and their channels have clear distinct statistical attributes when compared to other YouTube content that has not been labeled as trending. Based on the GC measure, the viewership of nearly all trending videos has some level of directional-relationship with other trending videos in our dataset. Our results also reveal a highly asymmetric directional-relationship among different categories of trending videos. Our directionality analysis also shows a clear pattern of viewership toward popular categories, whereas some categories tend to be isolated with little evidence of transitions among them.
  • 4. Introduction YouTube as a user generated content is one of the largest and most popular video sharing websites. It hosts over four billion views a day. YouTube provides public statistics regarding its uploaded videos, most notably the number of views, which shows the aggregate number of times a video has been watched up to that point. Naturally, the number of views for a video indicates the level of popularity of that video; and it takes a varying amount of time for a video to become popular (if it becomes popular). Meanwhile, there relatively short time. YouTube also supports a feature called trending, which represents content that has the potential of becoming popular in a relatively short time. Consequently, although trending videos are usually not popular (yet) when declared as trending by YouTube, they have the potential of becoming popular (eventually). For example, some videos are labeled trending while having only few hundreds in viewership numbers. From another perspective, through trending videos, YouTube tries to highlight emerging trends developing within different viewership communities. Meanwhile, the general attributes of the viewership of trending videos have not been studied thoroughly. To the best of our knowledge, basic statistics about YouTube trending videos have not been studied, analyzed, or even received any adequate attention. Considering the fact that more than one billion unique users visit YouTube each month and they upload 72 hours of video every minute [26], YouTube is the best place for e.g. brand engagement or advertising, but it is genuinely difficult and competitive to get the attention of users. Therefore when a video becomes popular, it is exposed to millions of users for free and has the opportunity of keeping their attention for a while. Finding these trends are significantly important that many different websites have been emerged just to pick up
  • 5. YouTube for content owners or advertisers. Better understanding of YouTube trending videos and their statistics, and a deeper insight about their lifecycles, can greatly affect the strategies for marketing, target advertising, recommendation systems and search engines, as was suggested by prior YouTube measurement studies [2]. This represents a key motivation for our effort Scope Our aim is to produce a scientific knowledge preprocessing analysis operating solely with the dataset US Videos. This step is important for all data processing exercises and that we wish to emphasize it. Before building theories from knowledge we'd like to grasp key knowledge attributes, like missing values, distinctive counts, outliers, and time-series trends. This kernel aims to function a tutorial to anyone fascinated by exploiting huge datasets. I focus only on the US videos dataset that isn't too huge by big-data standards (only twenty three, 362 rows by sixteen columns as of March, 2018). This knowledge set contains solely YouTube data and no data that area unit troublesome to method and store, like video, image, audio, or giant text documents. Still we are going to proceed with knowledge preprocessing and preliminary knowledge Analysis (EDA) as if this were a very huge dataset, using techniques that might be utilized in rather more difficult knowledge manning exercises. We have a tendency to worker variety of techniques from the Scikit/Learn toolkit to administer aspiring to the info at hand. INPUT DESIGN Input design is the process of converting user-oriented input to a computer based format. Input design is a part of overall system design, which requires very careful attention. Often the collection of input data is the most expensive part of the system. The main objectives of the input design are … 1. Produce cost effective method of input 2. Achieve highest possible level of accuracy 3. Ensure that the input is acceptable to and understood by the staff.
  • 6. INPUT DATA: The goal of designing input data is to make entry easy, logical and free from errors as possible. The entering data entry operators need to know the allocated space for each field; field sequence and which must match with that in the source document. The format in which the data fields are entered should be given in the input form. Here data entry is online; it makes use of processor that accepts commands and data from the operator through a key board. The input required is analyzed by the processor. It is then accepted or rejected. Input stages include the following processes Data Recording Data Transcription Data Conversion Data Verification Data Control Data Transmission Data Correction One of the aims of the system analyst must be to select data capture method and devices, which reduce the number of stages so as to reduce both the changes of errors and the cost. Input types, can be characterized as. External Internal Operational Computerized Interactive Input files can exist in document form before being input to the computer. Input design is rather complex since it involves procedures for capturing data as well as inputting it to the computer. Trending Channel_Title No of videos subscribers Company
  • 7. Primary key : channel_title is a primary key because it is unique key in which we can obtain all information through this single key. Candidate key : No candidate key because except A.id no one attribute can give information about the Airplane entity. Foreign Key : No foreign key present in this entity. Coding DATABASE DESIGN There are two choices to define the database schema. These are SQL and NoSQL. We can use traditional database management system like MsSQL or MySQL to keep data. As you know, we should keep information about videos and users into RDBMS. Other information about videos, called metadata, should be kept too. Now we have the main three tables to keep data. (Notice that we just only think the basic properties of Youtube. We can forget the recommendation system). User – UserID (primary key) – Name (nvarchar) – Age (Integer) – Email (nvarchar) – Address (nvarchar) – Register Date (DateTime) – Last Login (DateTime) SQL is a language used for managing data in relational databases that store data in tabular form with labelled rows and columns. We query data from a relational database with the select statement of SQL. The select statement is highly versatile and flexible in terms of data transformation and filtering operations.
  • 8. In that sense, SQL can be considered as a data analysis tool. The advantage of using SQL for data transformation and filtering is that we only retrieve the data we need. It is more practical and efficient than retrieving all the data and then applying these operations. In this article, we will use SQL statements and functions to analyze YouTube trending video statistics. The dataset is available on Kaggle. I created an SQL table that contains a small part of this dataset. Note: I’m using MySQL as the database management system. Although SQL syntax is mostly the same for all database management systems, there might be small differences. The table is called “trending” and it has the following structure. trending table We have the dates a video is published and becomes trending. We also have the title and channel of the video. The views and likes are the other two features the dataset contains.
  • 9. Regarding all these features (i.e. columns) we can do a bunch of different operations. For instance, a simple one can be finding the top 5 channels in terms of the number of trending videos. mysql> select channel_title, count(*) as number_of_videos -> from trending -> group by channel_title -> order by number_of_videos desc -> limit 5;+-----------------+------------------+ | channel_title | number_of_videos | +-----------------+------------------+ | Washington Post | 28 | | Netflix | 28 | | ESPN | 27 | | TED-Ed | 27 | | CNN | 27 | +-----------------+------------------+ We select the channel title column and count the number of rows. The “as” keyword is used to assign a new name to the aggregated columns. The group by clause is used to group the videos (i.e. rows) based on channels. Finally, we sort the results in descending order using the order by clause and display the first 5. The number of videos seems to be too low because I only included the ones published in January, 2018. We may want to see the title of the most-viewed video. mysql> select title, views -> from trending -> where views = (select max(views) from trending); (image by author)
  • 10. The query above contains a nested select statement. It is used with the where clause to find the desired condition which is the maximum values in the views column. The most-viewed video in this table has been watched almost 60 million times. SQL provides many different options for filtering the data. In the previous example, we found out that the most-viewed video belongs to Bruno Mars. We can filter the titles to only see the videos belong to Bruno Mars. mysql> select distinct(title) -> from trending -> where title like "%Bruno Mars%"; (image by author) We do not have to provide the exact value for filtering if we use the like keyword. The “%” represents any character so “%Bruno Mars%” represents any value that contains the “Bruno Mars” phrase. The distinct keyword is used to remove the duplicates. If we are not sure about characters being lower or uppercase, we can convert all the characters to lower or upper case before filtering. mysql> select distinct(lower(title)) -> from trending -> where title like "%bruno mars%";
  • 11. (image by author) The dataset contains the published date of videos and when they become trending. We can calculate the average time it takes for a video to become trending. Before calculating the difference, we need to extract the date part from the publish time column because it contains both the date and time. mysql> select trending_date, publish_time -> from trending -> limit 3; +---------------+---------------------+ | trending_date | publish_time | +---------------+---------------------+ | 2018-01-02 | 2018-01-01 15:30:03 | | 2018-01-02 | 2018-01-01 01:05:59 | | 2018-01-02 | 2018-01-01 14:21:14 | +---------------+---------------------+ The date function extracts the date part and the datediff function calculates the difference. Thus, we can calculate the average difference as follows: mysql> select avg(datediff(trending_date, date(publish_time))) -> as avg_diff -> from trending;+----------+ | avg_diff | +----------+ | 3.9221 | +----------+
  • 12. The datediff functions takes two dates separated by a comma and calculates the difference. It takes 3.92 days on average for a video to become trending. We can also calculate the average difference for videos that are published in a specific time period. We just need to add a where clause for filtering. mysql> select avg(datediff(trending_date, date(publish_time))) as avg_diff -> from trending -> where hour(publish_time) > 20;+----------+ | avg_diff | +----------+ | 4.4825 | +----------+ We extract the hour value from publish time and use it in the where clause for filtering. SQL provides functions for data aggregation which can be implemented in the select statement. For instance, we can calculate the average ratio of likes over views of videos published by Netflix. mysql> select avg(likes / views) -> from trending -> where channel_title = "Netflix";+--------------------+ | avg(likes / views) | +--------------------+ | 0.01816295 | +--------------------+ The average value is close to 0.02 so Netflix videos have approximately 2 percent like over view ratio.
  • 13. Let’s write a slightly more complicated query and calculate the average video views of channels that published more than 25 videos. We will also sort the results in descending order by the averages. mysql> select channel_title, avg(views) as avg_views, -> count(title) as number_of_videos -> from trending -> group by channel_title -> having number_of_videos > 25 -> order by avg_views desc; The retrieved data contains 3 columns. One is the channel title column and the other two are aggregated columns. We filter the channels based on the number of videos. You may have noticed that we used the “having” clause instead of the “where” clause for filtering. The “having” clause is used for filtering based on aggregated columns. ER Diagram
  • 14.
  • 15. Conclusion We have done some examples to analyse the YouTube trending video statistics. The examples clearly demonstrate that SQL can also be used as a data analysis tool.