SlideShare a Scribd company logo
1 of 14
Predicting News Popularity
CSC 424 Advanced Data Analysis and Regression
Ke Feng
07/03/2019
Introduction
 Mashable.com is a digital website
founded in 2005. It has now become
one of today’s most popular sources
to acquire information.
Dataset
 This dataset summarizes articles published by Mashable in a
period of two years. The data is publically available at University
of California Irvine Machine Learning Repository
 This original dataset has a total of 39644 observations and 61
variables. 58 of the variables will be used as predictors.
 The goal of this analysis is to predict news shares on social media
networks (popularity). My response variable is number of shares
on social media networks.
Literature Review
 Ding, C., & He, X. (2004). K-means clustering via principal component analysis. Twenty-first International Conference
on Machine Learning - ICML 04. doi:10.1145/1015330.1015408
 Heller, B. (1986). Statistics for experimenters, an introduction to design, data analysis, and model
building. Mathematical Modelling, 7(9-12), 1657-1658. doi:10.1016/0270-0255(86)90102-8
 Khuntia, J., Sun, H., & Yim, D. (2016). Sharing News Through Social Networks. International Journal on Media
Management, 18(1), 59-74. doi:10.1080/14241277.2016.1185429
 Hate Speech, Online and Social Media. (n.d.). Encyclopedia of Social Media and Politics.
doi:10.4135/9781452244723.n252
 Barthel, M. (2017, June 01). Despite subscription surges for largest U.S. newspapers, circulation and revenue fall for
industry overall. Retrieved from http://www.pewresearch.org/fact-tank/2017/06/01/circulation-and-revenue-fall-for-
newspaper-industry/
 Advertise With Mashable. (n.d.). Retrieved from https://mashable.com/advertise/
 Al-Zwainy, F. M., Abdulmajeed, M. H., & Aljumaily, H. S. (2013). Using Multivariable Linear Regression Technique for
Modeling Productivity Construction in Iraq. Open Journal of Civil Engineering, 03(03), 127-135.
doi:10.4236/ojce.2013.33015
Exploratory Stage: Clean and Explore the Data
 Check if there is any missing value
 Remove repetitive columns
 Check categorical variables
 Make a descriptive statistical
summary and check the structure
again
Categorical Variables Detection
No Missing ValueDescriptive Summary of Y-variable
Techniques
 Multiple Regression and Model Building
 PCA & Factor Analysis
Multi-regression and Model Building
 Check multicollinearity
 Split the data into 80% training and 20%
testing
 Use training set to do model
construction and use testing set to
predict value
 Model 1 has a R2 of 11.9% (TOO
LOW!)
 Automatic Model Selection (Stepwise &
Backward)
 Final Model
Data Partition (80%training+20%testing)
First Model Fitting Result
Result & Business Insights
 Parameter Estimate shows association
between Y and X-variables. Though not
causation, it shows association between Y
and Xs. Variables like data_channel_is_tech
and abs_title_subjectivity should be
highlighted.
 Insight
 Categorize articles in the right channel is
important. More tech articles may increase
the popularity.
 More subjectivity may increase popularity.
Personal views can boost traffic.
Model Fitting Results
Principal Component Analysis (PCA)
 Select components based on Scree
plots and Eigenvalue.
Scree Plot
Naming Components
 Factor 1: Length of the Article
 Factor 2: Use of key words
 Factor 3: Number of links
 Factor 4: Published Channel
 Factor 5: Is the title polarized
 Factor 6: Publication Date
* There are factors overlapped
Loadings (After Rotation)
Result & Business Insight
 Appropriate length
 Which day to publish matters
 More embedded popular article links
 Tech Channel is usually more popular
 Use proper amount of key words
 Create title with unique words
 Title should be polarized
Data and News Ethics
 Should we focus solely on news traffic?
 Is there a better way to measure “good news”?
Hate Speech, Online and Social Media. (n.d.). Encyclopedia of Social Media and Politics.
doi:10.4135/9781452244723.n252
Future Work
 More work in data cleaning (Low R2)
 Try out different transformation and model selection to see if
could improve my R2
 Try out different techniques to see if there are underlying
relationships that I failed to find out from previous studies
 More diversified variables will be tested
Predicting Online News Popularity

More Related Content

What's hot

Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides SlideTeam
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Building Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hBuilding Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hPrecisely
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modelingMariya Korsakova
 
Prescriptive analytics
Prescriptive analyticsPrescriptive analytics
Prescriptive analyticsIpsita Kulari
 
Doctoral Dissertation Prospectus
Doctoral Dissertation ProspectusDoctoral Dissertation Prospectus
Doctoral Dissertation ProspectusJosh Keller
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksSarah Dutkiewicz
 
Big data analytics in healthcare
Big data analytics in healthcareBig data analytics in healthcare
Big data analytics in healthcareJoseph Thottungal
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Data mining-2
Data mining-2Data mining-2
Data mining-2Nit Hik
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute PoojaPatidar11
 

What's hot (20)

SAS BASICS
SAS BASICSSAS BASICS
SAS BASICS
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides Data Transformation PowerPoint Presentation Slides
Data Transformation PowerPoint Presentation Slides
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Big data storage
Big data storageBig data storage
Big data storage
 
Git basic commands
Git basic commandsGit basic commands
Git basic commands
 
Building Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-hBuilding Your Enterprise Data Marketplace with DMX-h
Building Your Enterprise Data Marketplace with DMX-h
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modeling
 
Prescriptive analytics
Prescriptive analyticsPrescriptive analytics
Prescriptive analytics
 
Doctoral Dissertation Prospectus
Doctoral Dissertation ProspectusDoctoral Dissertation Prospectus
Doctoral Dissertation Prospectus
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
Big data analytics in healthcare
Big data analytics in healthcareBig data analytics in healthcare
Big data analytics in healthcare
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Creating a Data Driven Culture
Creating a Data Driven Culture Creating a Data Driven Culture
Creating a Data Driven Culture
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
 

Similar to Predicting Online News Popularity

10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data IJCERT JOURNAL
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...PhD Assistance
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxrandyburney60861
 
2014_11_Robles_Severson
2014_11_Robles_Severson2014_11_Robles_Severson
2014_11_Robles_SeversonAlfonso Robles
 
Applications Of Statistics In Software Engineering
Applications Of Statistics In Software EngineeringApplications Of Statistics In Software Engineering
Applications Of Statistics In Software EngineeringKristen Carter
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...Juan Mateos-Garcia
 
Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learningIAESIJAI
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
 
Running head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docx
Running head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docxRunning head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docx
Running head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docxhealdkathaleen
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big DataKhadija Atiya
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxandreecapon
 

Similar to Predicting Online News Popularity (20)

10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Challenges and outlook with Big Data
Challenges and outlook with Big Data Challenges and outlook with Big Data
Challenges and outlook with Big Data
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
2014_11_Robles_Severson
2014_11_Robles_Severson2014_11_Robles_Severson
2014_11_Robles_Severson
 
Applications Of Statistics In Software Engineering
Applications Of Statistics In Software EngineeringApplications Of Statistics In Software Engineering
Applications Of Statistics In Software Engineering
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
 
Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learning
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Cis 500
Cis 500Cis 500
Cis 500
 
Pad 500
Pad 500Pad 500
Pad 500
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Running head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docx
Running head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docxRunning head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docx
Running head CASE STUDY QUESTIONS 1CASE STUDY QUESTIONS7.docx
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
 

Recently uploaded

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Predicting Online News Popularity

  • 1. Predicting News Popularity CSC 424 Advanced Data Analysis and Regression Ke Feng 07/03/2019
  • 2. Introduction  Mashable.com is a digital website founded in 2005. It has now become one of today’s most popular sources to acquire information.
  • 3. Dataset  This dataset summarizes articles published by Mashable in a period of two years. The data is publically available at University of California Irvine Machine Learning Repository  This original dataset has a total of 39644 observations and 61 variables. 58 of the variables will be used as predictors.  The goal of this analysis is to predict news shares on social media networks (popularity). My response variable is number of shares on social media networks.
  • 4. Literature Review  Ding, C., & He, X. (2004). K-means clustering via principal component analysis. Twenty-first International Conference on Machine Learning - ICML 04. doi:10.1145/1015330.1015408  Heller, B. (1986). Statistics for experimenters, an introduction to design, data analysis, and model building. Mathematical Modelling, 7(9-12), 1657-1658. doi:10.1016/0270-0255(86)90102-8  Khuntia, J., Sun, H., & Yim, D. (2016). Sharing News Through Social Networks. International Journal on Media Management, 18(1), 59-74. doi:10.1080/14241277.2016.1185429  Hate Speech, Online and Social Media. (n.d.). Encyclopedia of Social Media and Politics. doi:10.4135/9781452244723.n252  Barthel, M. (2017, June 01). Despite subscription surges for largest U.S. newspapers, circulation and revenue fall for industry overall. Retrieved from http://www.pewresearch.org/fact-tank/2017/06/01/circulation-and-revenue-fall-for- newspaper-industry/  Advertise With Mashable. (n.d.). Retrieved from https://mashable.com/advertise/  Al-Zwainy, F. M., Abdulmajeed, M. H., & Aljumaily, H. S. (2013). Using Multivariable Linear Regression Technique for Modeling Productivity Construction in Iraq. Open Journal of Civil Engineering, 03(03), 127-135. doi:10.4236/ojce.2013.33015
  • 5. Exploratory Stage: Clean and Explore the Data  Check if there is any missing value  Remove repetitive columns  Check categorical variables  Make a descriptive statistical summary and check the structure again Categorical Variables Detection No Missing ValueDescriptive Summary of Y-variable
  • 6. Techniques  Multiple Regression and Model Building  PCA & Factor Analysis
  • 7. Multi-regression and Model Building  Check multicollinearity  Split the data into 80% training and 20% testing  Use training set to do model construction and use testing set to predict value  Model 1 has a R2 of 11.9% (TOO LOW!)  Automatic Model Selection (Stepwise & Backward)  Final Model Data Partition (80%training+20%testing) First Model Fitting Result
  • 8. Result & Business Insights  Parameter Estimate shows association between Y and X-variables. Though not causation, it shows association between Y and Xs. Variables like data_channel_is_tech and abs_title_subjectivity should be highlighted.  Insight  Categorize articles in the right channel is important. More tech articles may increase the popularity.  More subjectivity may increase popularity. Personal views can boost traffic. Model Fitting Results
  • 9. Principal Component Analysis (PCA)  Select components based on Scree plots and Eigenvalue. Scree Plot
  • 10. Naming Components  Factor 1: Length of the Article  Factor 2: Use of key words  Factor 3: Number of links  Factor 4: Published Channel  Factor 5: Is the title polarized  Factor 6: Publication Date * There are factors overlapped Loadings (After Rotation)
  • 11. Result & Business Insight  Appropriate length  Which day to publish matters  More embedded popular article links  Tech Channel is usually more popular  Use proper amount of key words  Create title with unique words  Title should be polarized
  • 12. Data and News Ethics  Should we focus solely on news traffic?  Is there a better way to measure “good news”? Hate Speech, Online and Social Media. (n.d.). Encyclopedia of Social Media and Politics. doi:10.4135/9781452244723.n252
  • 13. Future Work  More work in data cleaning (Low R2)  Try out different transformation and model selection to see if could improve my R2  Try out different techniques to see if there are underlying relationships that I failed to find out from previous studies  More diversified variables will be tested

Editor's Notes

  1. 3 Mashable stories shared per second 1.7 billion monthly cross platform content views 70 million unique content visitors 48 million social media followers