February 8, 2016
Character Count vs Pageview
I ﬁrst export top 100 pageviews post from thebacklabel. Then, I count number of characters in a post title.
The reason I did this is because we care about length of our title on the webpage. Number of characters is
better here. From the histogram, most of our titles have length from 20 to 40.
##  "/Users/jinziyue/Documents/Intern"
Number of characters in Page title
0 10 20 30 40 50 60 70
We plot the graph of character counts against pageviews. There is a peak in the graph, but
it is caused by two outliers. After we remove outliers, we barely see a pattern in the graph.
Character Count vs Page Entrance
This part follows exactly same process as previous part. We still cannot see a clear pattern in the graph of
character counts vs page entrance.
Number of Characters in Page title
10 20 30 40 50 60 70
Word Cloud of Page Title
When there is no clear pattern in previous plot, I turn to focus on how the content of the page title aﬀects
page view. Suggested by Dale, I only focused on posts which has pageviews more than 20 and average reading
time less than 10 minutes. I extract the top 200 posts and see what are most common words in their titles.
(visualizing by word cloud)
Of course, wine is the largest one in the plot. :) From the word cloud, we can see people like reading posts
contain “best”, “recipe(s)” in their titles.
Article Length vs Page Reading Time
Suggested by Dale, I only care about article which has reading time less than 10 minutes here. Both plots
here are about word counts in a post vs its average reading time. Even though the patter is not so clear here,
but we can say that a post has word counts between 300 to 600 has longer average reading time (exclude the
0 300 600 900
0 200 400 600 800 1000