Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Text mining
1. Wednesday, November 25, 2015
TEXT MINING
AU FOOTBALL FACEBOOK PAGE
Sai Praneeth Reddy
Auburn University
2. Executive Summary
Auburn athletic department is interested in analyzing the facebook posts on the AU football
page in order to get an idea of the topics that people are mostly talking about. They would also
like to know the sentiment of the people towards auburn football, if it is either positive or
negative.
In addition to that they also want to know which player or coach they are mostly talking about
and if the comments about them are positive or negative. The athletic department would like to
use this answers in order to improve there football team.
We use text analytics to answer the above mention questions and then perform sentimental
analysis to find if the general outlook towards the team is positive or negative.
4. Introduction
In order to improve Auburn’s football team performance the Auburn athletic department is interested in
analyzing the facebook posts, comments and replies on the AU football page. The athletic department
would like to know the general outlook of the public towards there football team. In addition to that the
athletic department wants to know the players that are being talked about the most and the opinion of
public towards these players.
Methodology
One of the most commonly used methodology to analyze texts is the text mining. in our case we
perform the analysis of the facebook posts using SAS E-miner. We first import the posts, comments
and replies on the the AU facebook into an EXCEL file using Web Crawler. Once the posts are imported
into an EXCEL file it is converted into a SAS readable format file using File import.
We then use the Text parsing node to remove unwanted words followed by text filtering were we group
words that are synonyms and also drop certain words that we are not interested in. In the text filtering
node we can also get the snippets of the text of the word we are interested in analyzing.The text cluster
node groups the terms into clusters where each cluster represents the terms that occur together.
!4
5. Analysis and RESULTS
DATA PREPARATION
The facebook posts are imported into an EXCEL file using web crawling, the EXCEL file is then
imported in SAS and is converted into SAS readable format using file import node.
TEXT PARSING
All the variables in the input data are set to rejected except for the post id whose role is set to
id and message whose role is role is set to Text. The text parsing node enables us to parse the
text and analyze the the number of terms and documents by frequency. In our Text parsing
node we dropped all words except for nouns, proper nouns and adjectives.
fig 1
The above ZIPF plot shows shows that Gus Malzahn is one of the widely disused topic, along
with Jermey Johnson and Will Muschamp.
!5
6. Some of the most widely discussed players and coaches are as follows:
Table 1
fig 2
The above Number of documents by weight plot shows that Jermy Johnson has relatively
heigh weight compared to all other players.
Names Weight
Malzahn 0.354
Jonathan wallace 0.534
Rhett Lashlee 0.614
Jermy Johnson 0.618
Carl Lawson 0.608
Will Muschamp 0.510
Sean White 0.6
!6
7. Some of the most widely discussed topics / words are:
Table 2
TEXT FILTERING
Text filter node is used to keep/drop terms that are either are too frequent or highly infrequent
as these terms are not of much use in grouping topics.The node also helps us in grouping
words that are similar to one another (i.e synonyms).
Using the interactive text filter we can also know in what context people are using someones
name are a word we are interested in. It will help us understand the sentiment of the people
towards a particular person or a topic.Using text filtering it is also possible to know which
words are strongly associated based on the concept link diagram which shows relationship
towards terms.
Topics Number of Documents
Defence 27
Offence 18
good 30
QB 163
Running 12
Receiver 10
!7
8. Sentiment Analysis:
• Gus Malzahn
Table 3
The above text snippets indicate that there is a negative perception among lot of people about
coach Gus Malzahn, lot of people seem to be blaming Gus Malzahn for the defeat.
!8
9. • Jermy Johnson
Table 4
From the above text snippets it appears that even though Jeremy Johnson did not have a great year lot of people
still seem to trust his abilities. It also appears that people think Auburn’s offense is better when Jermy Johnson is
the quarter back rather then Sean white.
• Sean White
!9
10. From the above text snippet it appears that there seems to be a no clear favorite quarter back,
as there is a lot of divided opinion on who the starting quarter back should be.
• Offence
Table 6
It appears that lot of people seem to blame the offense for the Auburns bad performance.
There seem to be a general opinion that the defense is doing better and the offense is letting
the team down.
• Defence
Table 7
!10
11. From the above text snippet it appears that there seem to be a generally positive outlook about
Auburn’s defense. They think that defense has improved a lot under Muschamp and it is the
offense that is letting them down.
TEXT CLUSTER
The text cluster node groups the terms into clusters where each cluster represents the related
terms that occur together. This can be particularly useful in the sense that the related terms are
grouped into clusters and the biggest sector into the circle represents the topic that most
customers are talking about.
Table 8
The words defense, explosive and Muschamp are placed in a single cluster indicating that
there that people are generally happy with the defense and attribute this improvement in
performance to Will Muschamp.
The words don’t, improvement, and Lashlee are used together a lot indicating that people in
general want the offense and the offense coach Rhett Lashlee to do better.
!11
13. TEXT TOPIC
From the text topic node output, we can find the terms that are grouped together and there
cutoffs . The text topic node can be refined further by using text cluster node. The text topic
node performs cluster analysis to combine words that are interesting to analysts.
Table 9
!13
14. CONCLUSION
From the analysis of the facebook posts it appears that people are in general disappointed with
the overall performance of the team. Though they feel that the defense has done better then
last season it is the offense that let them down.
It also appears that people prefer Jermy Johnson as the teams Quarter back over Sean White.
In addition to that majority of the people seem to blame the head coach Gus Malzahn for the
teams failure and think that the defense coach Will Muschamp has done a good job.
!14