SlideShare a Scribd company logo
1 of 86
Running Head: SOCIAL MEDIA DATA MINING 1
Social Media Data Mining
Action Research
IST 8101
Teresa Rothaar
SOCIAL MEDIA DATA MINING 2
Table of Contents
List of Tables & Figures............................................................................................................... 5
Introduction................................................................................................................................... 6
Methodology .................................................................................................................................. 7
Literature Review ....................................................................................................................... 10
Proposal........................................................................................................................................ 13
First Iteration – Plan................................................................................................................... 15
Task No. 1: Meeting with Dr. Scanlon .................................................................................. 15
Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 16
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
................................................................................................................................................... 16
Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 17
First Iteration – Action............................................................................................................... 17
Task No. 1: Meeting with Dr. Scanlon .................................................................................. 17
Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 17
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
................................................................................................................................................... 18
Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 18
First Iteration – Observation..................................................................................................... 18
Task No. 1: Meeting with Dr. Scanlon .................................................................................. 18
Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 19
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
................................................................................................................................................... 22
Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 24
First Iteration – Reflection......................................................................................................... 25
Task No. 1: Meeting with Dr. Scanlon .................................................................................. 25
Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 25
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
................................................................................................................................................... 26
Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 26
Second Iteration – Plan .............................................................................................................. 27
Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 27
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 27
Task No. 3: Post Query to Relevant LinkedIn Groups and on Quora............................... 28
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 28
Task No. 5: Review Responses & Take Notes ...................................................................... 28
Second Iteration – Action........................................................................................................... 29
Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 29
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 29
Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora ........................ 29
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 30
SOCIAL MEDIA DATA MINING 3
Task No. 5: Review Responses & Take Notes ...................................................................... 30
Second Iteration – Observation................................................................................................. 31
Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 31
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 32
Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora ........................ 33
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 34
Task No. 5: Review Responses & Take Notes ...................................................................... 35
Second Iteration – Reflection..................................................................................................... 44
Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 44
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 44
Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora ........................ 45
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 46
Task No. 5: Review Responses & Take Notes ...................................................................... 46
Third Iteration – Plan................................................................................................................. 47
Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 47
Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 48
Task No. 3: Install Python & Any Other Required Software on my Computer............... 48
Task No. 4: Work Through the Selected Tutorial ............................................................... 48
Third Iteration – Action............................................................................................................. 49
Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 49
Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 49
Task No. 3: Install Python & Any Other Required Software on my Computer............... 49
Task No. 4: Work Through the Selected Tutorial ............................................................... 50
Third Iteration – Observation ................................................................................................... 50
Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 50
Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 53
Task No. 3: Install Python & Any Other Required Software on my Computer............... 54
Task No. 4: Work Through the Selected Tutorial ............................................................... 56
Third Iteration – Reflection....................................................................................................... 59
Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 59
Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 60
Task No. 3: Install Python & Any Other Required Software on my Computer............... 60
Task No. 4: Work Through the Selected Tutorial ............................................................... 61
Fourth Iteration – Plan............................................................................................................... 62
Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots
................................................................................................................................................... 62
Task No. 2: Make a List of Things I Need to Learn ............................................................ 62
Task No. 3: Construct a Plan for Future Study ................................................................... 63
Fourth Iteration – Action........................................................................................................... 63
Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots
................................................................................................................................................... 63
Task No. 2: Make a List of Things I Need to Learn ............................................................ 64
Task No. 3: Construct a Plan for Future Study ................................................................... 64
SOCIAL MEDIA DATA MINING 4
Fourth Iteration – Observation ................................................................................................. 64
Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots
................................................................................................................................................... 64
Task No. 2: Make a List of Things I Need to Learn ............................................................ 66
Task No. 3: Construct a Plan for Future Study ................................................................... 67
Fourth Iteration – Reflection..................................................................................................... 73
Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots
................................................................................................................................................... 73
Task No. 2: Make a List of Things I Need to Learn ............................................................ 73
Task No. 3: Construct a Plan for Future Study ................................................................... 74
Final Reflective Statement.......................................................................................................... 75
References.................................................................................................................................... 78
SOCIAL MEDIA DATA MINING 5
List of Tables & Figures
Figure 1. Flow chart illustrating the four iterations for my action research project..................... 15
Figure 2. Map of big data job volume by Metropolitan Statistical Area (MSA) using data from
WANTED Analytics. Reprinted from “Where Big Data Jobs Will Be In 2015” in Forbes,
by L. Columbus, 2014. Retrieved on June 15, 2015, from
http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-jobs-will-be-in-
2015. Copyright 2014 by Louis Columbus. Reprinted with permission. ............................. 24
Figure 3. Screenshot of example 1 in the Mining the Social Web chapter 1 tutorial. I have
blacked out my Twitter credentials because they need to be protected in the same manner as
passwords. The plain text at the bottom is an indication that the code worked and access to
the API was granted by Twitter. ........................................................................................... 57
Figure 4. Screenshot of example 2 in the Mining the Social Web chapter 1 tutorial. This code
snippet asks Twitter to retrieve topics that are trending in the U.S. and worldwide and print
them. The text below is truncated for visibility purposes because it is quite long. .............. 57
Figure 5. Screenshot of example 3 in the Mining the Social Web chapter 1 tutorial. This code
snippet uses JSON to display the results collected in example 2 in a more readable format;
again, the text displayed below is truncated for visibility. ................................................... 58
Figure 6. Screenshot of example 12 in the Mining the Social Web chapter 1 tutorial. This code
snippet uses the matplotlib.pyplot Python package to plot a word frequency graph for
selected Tweets. The graph displayed, but I could not figure out why the error message
above it displayed.................................................................................................................. 58
Figure 7. Summary of data science training options and the benefits and drawbacks of each.
Reprinted from “How do I become a data scientist?” by M. Stringer, A. Wolf, I. Sirer, D.
Malmgren, and L. Skelly, 2014. Retrieved on August 15, 2015, from
http://datascopeanalytics.com/blog/how-do-i-become-a-data-scientist-an-evaluation-of-3-
alternatives. Copyright 2014 by Mike Stringer, Aaron Wolf, Irmak Sirer, Dean Malmgren,
and Laurie Skelly. Reprinted with permission...................................................................... 69
SOCIAL MEDIA DATA MINING 6
Introduction
In a season one episode of the CBS television series Person of Interest, the two main
characters had the following exchange while discussing an individual who had no social media
footprint:
REESE
I've never understood why people put all their information on those sites. Used to make
our job a lot easier in the C.I.A. (takes another sip from his cup)
FINCH
Of course, that's why I created them.
REESE
You're telling me you invented online social networking, Finch?
Reese stands up. Finch goes to his computer, setting down his doughnut.
FINCH
The Machine needed more information. People's social graph, their associations.
(opens up search on social networking website for "Jordan Hester")
The government had been trying to figure it out for years. Turns out most people were
happy to volunteer it. Business wound up being quite profitable, too (Berg & Beeson,
2012).
People are, indeed, volunteering their information on social media sites, which has
resulted in a never-ending stream of timely, easily accessible market research information for
organizations (Thiel, Kötter, Berthold, Silipo, & Winters, 2012, p. 3). However, the
unfathomable amount of information that is available presents its own problem: how to “[access]
that data and [transform] it into something that is usable and actionable” (Thiel et al., p. 3). It is
SOCIAL MEDIA DATA MINING 7
critically important for organizations to make use of this data; the McKinsey Global Institute
reported that retailers who make use of data analysis could increase their profits by as much as
60% (as cited in Greenspan, n.d.). Knowing this, organizations are clamoring for data analysts
and data scientists, but the supply of talent is so short that some organizations have had to turn to
unconventional recruiting methods, such as crowdsourcing on analytics competition sites such as
Kaggle (Marr, 2015b).
My action research project will study social media data mining, in particular mining text
data on Twitter. By completing this project, I will build knowledge about social web data mining
and the job market for applicants with social web mining skills, including which programming
languages and technologies are involved and how to obtain these skills. My ultimate goal is to
either get a job or offer social media data mining and analysis services to private-sector
organizations as a consultant.
Methodology
According to Baskerville and Myers (2004), “Action research aims to solve current
practical problems while expanding scientific knowledge” (p. 329). Action research traces its
origins to the social sciences post-WWII. It was developed by Kurt Lewin in 1947 at the
University of Michigan, with the purpose of studying “social psychology within the framework
of field theory” (Baskerville & Myers, 2004, p. 330). Specifically, Lewin used action research to
study the psychological effects of war and incarceration in POW camps on returning WWII vets:
what we would today call post-traumatic stress disorder, or PTSD (Baskerville & Wood-Harper,
1996, p. 236). At the time, almost no research had been done on this disorder; in fact, the
diagnosis of PTSD was not added to the Diagnostic and Statistical Manual of Mental Disorders
(the DSM, the “psychiatrists’ bible”) until 1980 (Friedman, n.d.). Researchers, perplexed by the
SOCIAL MEDIA DATA MINING 8
widely varying symptoms the veterans under study were displaying, decided to try a different
approach (Baskerville & Wood-Harper, 1996):
Hence, the idea of social action arose. Scientists intervened in each experimental case by
changing some aspect of the patients’ being or surroundings. Since scientist and therapist
were one, the scientists were participants in their own research. The effects of the actions
were recorded and studied. In this manner, a body of knowledge was developed about
successful therapy for the illnesses. (p. 236-237).
Huang (2010) noted that the term “action research” is often confusing to beginners
because it is not a specific methodology (like the Scrum programming methodology, for
example) but “an umbrella term that represents a ‘family’ of practices” (p. 94) that are focused
on the researcher being an active participant in the research being conducted, as opposed to a
non-participatory, neutral observer:
Action research is an orientation to knowledge creation that arises in a context of practice
and requires researchers to work with practitioners. Unlike conventional social science,
its purpose is not primarily or solely to understand social arrangements, but also to effect
desired change as a path to generating knowledge and empowering stakeholders. We may
therefore say that action research represents a transformative orientation to knowledge
creation in that action researchers seek to take knowledge production beyond the
gatekeeping of professional knowledge makers. (p. 93)
While there are numerous action research methodologies, the key similarity among all of
them is that researchers do not simply sit back, observe the results of an experiment (without
interference, and perhaps without ever interacting with the subjects), and record them—as is
done, for example, in a drug trial—but actively participates in the research with the goal of
SOCIAL MEDIA DATA MINING 9
bringing about some sort of change for themselves and/or their organizations. If there is no active
participation, there is no “action research.”
Baskerville and Wood-Harper (1996) stated that the five phases of an action research
project are:
1. Diagnosing.
2. Action planning.
3. Action taking.
4. Evaluating.
5. Specifying learning (p. 237).
Similar to agile programming methodologies, these phases are iterative; the researcher
cycles through them throughout the course of the research study (Baskerville & Wood-Harper,
1996, p. 237).
Although the concept of action research is rooted in the social sciences post-WWII,
action research is quite suitable to the information technology/information systems field in the
21st century. Baskerville and Wood-Harper (1996) argue that because information technology “is
a highly applied field” and “almost vocational in nature,” the action research methodology works
well because it is “highly clinical in nature” (p. 235), “[places] IS researchers in a ‘helping-role’
within the organizations that are being studied” (p. 235), and “merges research with praxis” (p.
235).
Action research is a suitable methodology for my project because I wish to, as
Baskerville and Wood-Harper stated, “merge research with praxis” (p. 235). My goal is not
simply to write a research paper about a particular topic, as I did in IST 8100 last semester, but
take a hands-on approach where I can use my hybrid STEM and business education background
SOCIAL MEDIA DATA MINING 10
and my work experience in marketing to learn about the social media data mining field and how
I can get started in it. My research will be of interest to anyone who is interested in entering this
field, either as a consultant or an employee of an organization.
Literature Review
Gundecha and Liu (2012) define data mining as “a process of discovering useful or
actionable knowledge in large-scale data” (p. 2) that “is an integral part of many related fields
including statistics, machine learning, pattern recognition, database systems, visualization, data
warehouse, and information retrieval” (p. 2). Social media data mining is a new subfield within
the broader category of data mining (p. 2).
Social media is “an exceptionally rich resource that allows [big data] researchers … to
study and understand human behavior and activities in unprecedented ways” (Liu, 2014, para 2).
Social media network users have unfettered access to “readily available never-ending uncensored
information” (Adedoyin-Olowe, Gaber, & Stahl, 2013, p. 3).
However, social media data is unstructured, dynamic, and filled with “noise,” making it
useless in its raw form (Gundecha & Liu, 2012; Thiel et al., 2012; Liu, 2014; Adedoyin-Olowe et
al., 2013). Spam is also a problem; studies by Yardi et al. and Chu et al. (as cited in Gundecha &
Liu, 2012) found that “spammers generate more data than legitimate users” (p. 4). Liu (2014)
pointed out two challenges in particular: (1) having too much data about people we do not need
more information about, such as celebrities and other famous people, and not enough data about
people we do want to know more about, specifically, the average individual who is a potential
customer for a business; and (2) because of the newness of social media and the aforementioned
lack of structure, difficulties with empirical observation of the data that is available (para. 7).
SOCIAL MEDIA DATA MINING 11
According to De Lacvivier (2013), Twitter, in particular, lends well to data mining from a
developer’s perspective for the following reasons:
 Twitters's API is well designed and easy to access.
 Twitter data is in a convenient format for analysis.
 Twitter's terms of use for the data are relatively liberal. It is generally accepted
that tweets are public and accessible to anyone, hence the asymmetric following
model that allows access to any account without request for approval. (De
Lacvivier, 2013, “The ultimate data mining platform”)
Additionally, according to Russell (cited by De Lacvivier, 2013), mining Twitter “doesn’t
require advanced developer or data scientist skills,” a notion that Russell feels causes many
developers to shy away from data mining (para. 1).
Sentiment analysis is the process of determining whether a snippet of text is conveying
positive or negative emotions (Bifet & Frank, 2010, p. 4). This technique “depends on an
appropriate subjectivity lexicon that understands the relative positive, neutral or negative context
of a word or expression” (Thiel et al., 2014, p. 4). These lexicons are “both language and context
specific” (Thiel et al., 2014, p. 4). When a lexicon is built, it is used to train an automated
sentiment classifier (Bifet & Frank, 2010, p. 4). However, building a lexicon for sentiment
analysis depends on the existence of quality training and test datasets, and in social media
mining, there usually are no training or test data sets (Liu, 2014, para 9).
Bifet and Frank (2010) noted that Twitter sentiment analysis is not a simple task: “a tweet
can contain a significant amount of information in very compressed form, and simultaneously
carry positive and negative feelings” (p. 4) and “some tweets may contain sarcasm or irony” (p.
4). However, Twitter users often provide clues regarding what sentiment they are conveying,
SOCIAL MEDIA DATA MINING 12
such as including smileys and emoticons in their tweets (p. 4). These characters can be added to
the lexicon to improve the learning process of a sentiment classifier (p. 4).
However, the results of a study performed by Kouloumpis et al. (2011) questioned the
value of adding emoticons to a lexicon (p. 541). The conclusions were that including features
that are specific to microblogging, such as hashtags, emoticons and abbreviations, to a sentiment
classifier lexicon yielded better results than not including this data (p. 541). However, when
hashtags were included, the value of using emoticons diminished, suggesting that these features
may not be complementary from a data mining perspective (p. 541). The findings indicated that
part-of-speech analysis techniques that work well on more structured data “may not be useful for
sentiment analysis in the microblogging domain” (p. 541).
Despite these challenges, however, “Sentiment analysis using text mining can be very
powerful and is a well-established, stand-alone predictive analytic technique” (Thiel et al., 2014,
p. 4). Asur and Huberman (2010) used Twitter text mining and sentiment analysis to predict
movie box office revenues. Their findings illustrated that, prior to a movie’s release, “the rate at
which movie tweets are generated can be used to build a powerful model for predicting movie
box-office revenue” (p. 492). The predictions made by Asur and Huberman were more accurate
than those of “the Hollywood Stock Exchange, the gold standard in the industry” (p. 492).
Further, the researchers found, after a movie was released, sentiment analysis could be used to
further hone their initial predictions (p. 493).
Thiel et al. (2014) performed a research study combining predictive sentiment analysis
using text mining with network analysis, which focuses not on text, but on the relationships
between individuals within social media networks; in other words, who follows whom. By
combining these two techniques, the researchers were able to “position negative and positive
SOCIAL MEDIA DATA MINING 13
users in context with their relative weight as influencers or followers” (p. 17-18). The study,
which used publicly available data from Slashdot and the KMIME data analytics program, found
that “participants who are very negative in their sentiment are actually not highly regarded as
thought leaders by the rest of the community,” a result that “goes against the popular marketing
adage that negative users have a very high effect on the community at large” (p. 3). The
researchers stated that they felt this unexpected insight into consumer behavior could not have
been discovered using either predictive analysis or network analysis alone (p. 5).
Social media data mining is a new field filled with nearly unlimited opportunity for
market researchers, but also many challenges. The goals of social media data mining and
analysis are to separate relevant information from noise and transform the relevant data “into
something that is usable and actionable” (Thiel et al., 2012, p. 3). Researchers are attempting to
adapt traditional data mining techniques for use in social media, with varying results
(Kouloumpis et al., 2011). Because of its ease of use, data format, terms of use, and sheer
amount of daily content generation, Twitter is very popular among data researchers, and its full
potential has not yet been tapped (De Lacvivier, 2013). Much research is being done on
sentiment analysis and network analysis, and at least one study suggests that combining the two
methods might yield the most useful results (Thiel et al., 2012).
Proposal
My project deals with learning how to mine the social web for market research data. As I
mentioned in a previous section, it is my goal to, as Baskerville and Wood-Harper (1996) put it,
“merge research with praxis” (p. 235) so that I possess new skills upon its completion. My
preliminary research found that “social media mining” is a brand-new and very wide field; it
would be impossible for me to thoroughly investigate the entire scope over the next 12 weeks.
SOCIAL MEDIA DATA MINING 14
Therefore, I am going to focus my project on learning about the job market for social media data
mining, what types of jobs are available in this field at the entry level, and which specific skills
(especially technological skills) are required to obtain entry-level work in this field, either as an
employee or a consultant. If possible, I would like to build a couple of “toy” Twitter mining
programs that I could use as the beginning of a portfolio. My external stakeholders are anyone
who is interested in entering the social media data mining field.
In Iteration 1, I will support what I want to do. I will use Google to perform Internet
research the market need for social media data mining and research what types of jobs are
available in the field.
In Iteration 2, I will use LinkedIn to reach out to hiring managers, recruiters, data
analysts, and other appropriate professionals and speak with them regarding the job market for
applicants with social media data mining skills. I will question them about what types of jobs are
available and which skills are needed.
In Iteration 3, I will work through some tutorials for the purpose of determining the
specific skill sets needed to perform social media data mining, so that I can find out which skills
I already have and which skills I need to obtain.
In Iteration 4, I will review what I did in the previous three iterations, examine what went
well and what did not, find out how much more I need to learn, and construct a specific plan of
future study.
A flow chart of my iterations is presented in Figure 1 below:
SOCIAL MEDIA DATA MINING 15
Figure 1. Flow chart illustrating the four iterations for my action research project.
First Iteration – Plan
In Iteration 1, I must lay the foundation for the remainder of my project and build support
for what I want to do. The scheduled tasks in this iteration are as follows:
Task No. 1: Meeting with Dr. Scanlon
Because I did not do well on the last paper I submitted, I have scheduled a face-to-face
meeting with Dr. Scanlon to determine how I can get my project back on track. I have set aside
one hour for this meeting. The only other person involved will be Dr. Scanlon. My goal is to
answer the following questions:
SOCIAL MEDIA DATA MINING 16
 How can I modify my project to meet the requirements of the course?
 Once my project is modified, how do I move forward with my next paper and the
remainder of the course?
Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining
In this task, I will collect primary market research data on the market demand/need for
social media data from the point of view of a company or consultant who wishes to offer these
services. The expected duration is two hours a day for three days, or six hours total. To complete
the task, I will need a computer with an Internet connection and appropriate time allocation. No
other people will be involved at this stage. My goal is to answer the following questions:
 Why mine the social web? What is social media data mining good for?
 Who is interested in social media data? Which types of organizations?
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
This is similar to Task No. 2, but instead of collecting data on the market demand for
social media data mining, I will collect preliminary data on the job market for applicants with
social media data mining skills (more detailed data will be collected during the interviews I will
conduct during the next iteration). To collect my data, I will do a Google search on the terms
“social media data mining” and “data mining” paired with the phrases “jobs,” “careers,” and
“career outlook.” The expected duration is two hours a day for three days, or six hours. To
complete the task, I will need a computer with an Internet connection and appropriate time
allocation. No other people will be involved at this stage. My goal is to answer the following
questions:
 What types of jobs exist for applicants who have skills related to mining the social
web?
SOCIAL MEDIA DATA MINING 17
 Are these jobs located in specific geographic areas? Specific industries?
 What specific skill sets might an applicant need?
Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3
In this task, I will review and organize the data I collected during Tasks 2 and 3. The
expected duration is two hours a day for three days, or six hours. To complete the task, I will
need a computer with an Internet connection and appropriate time allocation. No other people
will be involved at this stage. My goals during this stage will be to narrow my research focus,
clarify my topic, and organize all of my information, eliminating sources and/or returning to
Tasks 1 and 2 to gather more as needed.
First Iteration – Action
Task No. 1: Meeting with Dr. Scanlon
My meeting with Dr. Scanlon lasted approximately 40 minutes. During the meeting, we
clarified my goals for my project, which are to build a portfolio and either obtain a job or work
as a consultant in the social data mining field. We agreed that social media mining was a good
choice for a topic, and that I simply had to modify my project to be action research oriented
instead of just a research paper. Dr. Scanlon assisted me with coming up with new iterations to
replace my existing ones, which were not action research-oriented. He clarified the definition of
“stakeholders” as relates to my project and assisted me with determining my project’s
stakeholders.
Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining
I used Google to perform searches on “social media data mining,” “data mining,” and
“big data.” I estimated that this task would take approximately six hours over a three-day period,
but instead I spent about eight hours on this task.
SOCIAL MEDIA DATA MINING 18
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
I used Google to perform searches on the terms “social media data mining” and “data
mining” paired with the phrases “jobs,” “careers,” and “career outlook.” The only search engine
I used was Google, and I followed 22 links. My initial estimate was that this task would take
approximately six hours over a three-day period, and the actual duration was about two hours
over a two-day period.
Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3
I anticipated that this task would take six hours over three days, but it took me only about
four hours over two days. The work consisted of reading each of the 22 articles I bookmarked
during my research and discarding the material that did not apply to this project, or, alternatively,
saving it if I felt it might apply to a future iteration.
First Iteration – Observation
Task No. 1: Meeting with Dr. Scanlon
I set aside an hour for the meeting, but it ended up taking only 40 minutes because I was
not as far behind as I had thought. I also came to the meeting prepared, with my materials and
questions, so that no time was wasted. Dr. Scanlon helped me redefine my iterations. The result
was that I left the meeting with all of the information I needed to proceed with my project and
write my first iteration paper, and a clearer understanding of what is expected of me during the
course of this project.
The new iterations are as follows:
1. Support what I want to do – research the market need for social media data
mining, and research what types of jobs are available in the field.
SOCIAL MEDIA DATA MINING 19
2. Locate and interview hiring managers, recruiters, data analysts, and other
appropriate professionals regarding the job market for applicants with social
media data mining skills; question them about what types of jobs are available
and which skills are needed.
3. Examine the skill sets needed to perform social media data mining; focus on
which specific skills are required.
4. Discuss my results. What happened? How much more do I need to learn? I can
also perform some portfolio-building during this stage by coding test programs to
mine sample Twitter data.
We also defined my stakeholders (other than me): anyone who is interested in a career in
social media data mining, whether as a consultant or an employee. We went over my literature
review, and I discovered that my problem was that I had simply quoted the various pieces of
literature without analyzing them and finding commonalities. Dr. Scanlon agreed with my
statement that I may have used too many sources, and that I would have been better off using
fewer sources but providing a more in-depth explanation and analysis of each, along with
pointing out their commonalities.
Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining
I estimated that this task would take approximately six hours, but it took about eight,
primarily due to finding results that, while not applicable to this project or this particular phase
of the project, I was interested in and bookmarked for further reading; for example, I found many
threads on Quora.com about how to become a data analyst/data scientist and the skills that are
needed to enter the field, as well as a number of tutorials on the Python and R programming
SOCIAL MEDIA DATA MINING 20
languages. My research revealed that there is much demand for social media data mining and
analysis among organizations in all industries, as documented below.
The Google search results showed that social media data mining is “an emerging
discipline under the umbrella of data mining” (Zafarani, Abbasi, & Liu, 2014, p. 16). It has a
myriad of applications in the public and private sector, as Betancourt (2010) elaborated:
Entities such as airlines, politicians, and even non-profits can use this data for finding
new customers or targeting products to existing ones. Financial services companies such
as banks and lenders are also using the same data mining services for marketing purposes
and to make lending decisions. For instance, certain types of credit products, which fit
your personality, could be marketed specifically to you. (“How Data is Being Used”)
The amount of content being shared on social media sites has skyrocketed in recent years:
Facebook exceeded one billion active users in 2012; approximately 500 million tweets are being
posted daily; and there an estimated 250 million blogs on the web (McBeath, 2013, para. 1). At
first, many organizations saw social media as a vehicle to broadcast marketing messages to
customers and personally connect them with the company’s brand (Boorman, 2011, para. 1).
However, “many of these messages focus on brands and products” (Salampasis, Paltoglou, &
Giachanou, 2013, p. 87) and every time a user shares content on a social media site, they are
providing information regarding “their preferences, their demographics, their behaviors, and
their relationships” (Ridge, 2014, “Social Data has Scope”). Best of all, the majority of this data
is public and “free for the taking” (McBeath, 2013, “Mining for Sentiment”).
Realizing the marketing potential, many companies are now mining this data and using it
analyze how customers feel about their products and services, a process known as “sentiment
analysis” (McBeath, 2013, “Mining for Sentiment”). Organizations are also using social media
SOCIAL MEDIA DATA MINING 21
data to “build dossiers” on current and potential customers (Betancourt, 2010, para. 1) that can
be used to send highly targeted marketing messages (Betancourt, 2010, “How Data is Being
Used”). Other potential uses of social media mining include:
 Recruiting and vetting job candidates: According to a survey by Jobvite (2014),
“73% of recruiters have hired a candidate through social media” (p. 9) and 93%
“will review a candidate’s social profile before making a hiring decision” (p. 10).
 Preventing and fighting crime: Law enforcement is mining social media data to
“detect and counter criminal activities” (McBeath, 2013, “Beyond Sentiment”).
Sometimes, content posted by perpetrators, such as video of a suspect bragging
about a crime, is later used in court (McBeath, 2013, “Beyond Sentiment”).
Federal law enforcement mines the social web to track terrorist chatter and other
threats to national security (McBeath, 2013, “Beyond Sentiment”).
 Epidemiology and public health: Social data is being used “to help identify, track,
and predict disease outbreaks” (McBeath, 2013, “Beyond Sentiment”).
 Emergency and disaster response: Oak Ridge National Laboratory is researching
how social media data can be used to better assess the nature and scope of
disasters, and thus better respond to them (McBeath, 2013, “Beyond Sentiment”).
Supply chain risk managers are mining the same data so they can become aware
of and quickly react to natural and man-made disasters that could impact their
supply pipelines (McBeath, 2013, “Beyond Sentiment”).
However, despite the wide interest in social media data mining, many companies still do
not know how to sort through all of this information and extract actionable business intelligence
(Ridge, 2014, para. 2). There are challenges associated with dealing with such voluminous data
SOCIAL MEDIA DATA MINING 22
sets and separating irrelevant information or spam from relevant data (Boorman, 2011, “Tapping
Social Media Data”).
In conclusion, social media data mining is a timely topic, and there is a strong market
demand for data mining and analysis services.
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
I estimated that this task would take approximately six hours, but it ended up taking me
only about two hours, primarily because I stumbled across job market information while
completing the previous task. I was unable to find job market information about “social media
data mining,” but much information was available regarding big data and data mining in general.
My research, which is documented below, indicated that the job market for applicants with data
mining and analysis skills is very strong.
Robert Schnabel, dean of the School of Informatics and Computing at Indiana University,
told Forbes, “We see the data science job market requiring two types of professionals: those with
deep technical skills, and managers and analysts with the knowledge to use the analysis of big
data to make effective decisions” (as quoted in Violino, 2014, “New Data Degree Programs”).
There is a critical shortage of applicants with data science skills: According to Columbus (2014),
“Demand for Computer Systems Analysts with big data expertise increased 89.9%” from
January through December 2014. At the same time, McKinsey Global Institute, in a May 2011
report, predicted that by 2018, the United States would face a shortfall of 140,000 to 190,000
applicants with “deep analytical skills” and 1.5 million managers (as cited in Jain, 2013).
Columbus’s (2014) research, which examined job listings and job market information
collected by WANTED Analytics, found the following:
SOCIAL MEDIA DATA MINING 23
 The industries with the most job openings in the big data field were “Professional,
Scientific and Technical Services (27.14%), Information Technologies (18.89%),
Manufacturing (12.35%), Retail Trade (9.62%) and Sustainability, Waste
Management & Remediation Services (8.20%).”
 The top geographical areas for jobs were the San Francisco Bay area and the
Washington, D.C. area. However, data jobs can be found in major metropolitan
areas throughout the country, as shown in Figure 2 below.
 The skills that appeared most frequently in big data job ads were Python, Linux,
and SQL, with Python seeing the most growth in demand (nearly 97%) between
2013 and 2014.
 The median salary for big data professionals was found to be $103,000/year, and
sample job titles included “Big Data Solution Architect, Linux Systems and Big
Data Engineer, Big Data Platform Engineer, Lead Software Engineer, [and] Big
Data (Java, Hadoop, SQL).”
SOCIAL MEDIA DATA MINING 24
Figure 2. Map of big data job volume by Metropolitan Statistical Area (MSA) using data from
WANTED Analytics. Reprinted from “Where Big Data Jobs Will Be In 2015” in Forbes, by L.
Columbus, 2014. Retrieved on June 15, 2015, from
http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-jobs-will-be-in-2015.
Copyright 2014 by Louis Columbus. Reprinted with permission.
Ide (2014) stated that social media is one of the “hottest sectors for big data growth”
(para. 1), which coincides with the market demand for social media mining and analysis that I
found while performing research in Task 2.
Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3
Reviewing and organizing the results of my research took only about four hours, two
short of my original estimate. This was because I knew exactly what I was looking for and had
previously performed some research on this topic before beginning this class; I already had a
SOCIAL MEDIA DATA MINING 25
“data mining” folder in my web browser’s bookmarks bar. Therefore, it did not take me very
long to decide to use, discard, or save for future use any particular source.
First Iteration – Reflection
Task No. 1: Meeting with Dr. Scanlon
This meeting went very well. My only mistake was that I did not ask for help earlier,
before I wrote my last paper; if I had, I would have done much better and been much further
along. I did not have a firm understanding of the action research process or how to put together a
project that met the requirements of action research. I also did not fully understand how to
conduct a literature review. I had never written one before. I suggested to Dr. Scanlon that the
IST program at Wilmington University be modified to include at least one exposure to literature
reviews prior to IST 8101; IST 8100 would be a good class for this. Finally, I made many small
mistakes regarding APA citation style, and I need to ensure that I format my future papers
correctly.
If I have any questions, concerns, or difficulties as I work through my next three
iterations, I will seek help immediately.
Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining
I felt this task went really well, even though it took me two hours longer than expected.
Social media data mining is a very popular current topic, and I had no difficulty finding
information on the market for this type of service. In addition to the sources used during this
iteration, I found many resources that I bookmarked for later use. This reduced the time needed
to work on Task 3 during this iteration, and I anticipate that it will reduce needed for research
during future iterations. However, as I mentioned in my reflection for Task 1, I wish I had sought
help regarding my project earlier, thus giving myself more time.
SOCIAL MEDIA DATA MINING 26
Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining
Similar to my experience with Task 2, I feel that this task went very well. It did not take
as much time as I’d expected, largely because I came across a lot of job market information
while completing Task 2. The drawback was that I was unable to find job market information
about social media data mining specifically, only big data/data science/data mining in general. I
discarded some of the articles because they did not apply to this project or this iteration. For
example, some of the articles were about the big data job market in India, and I am focusing on
the U.S.; other articles contained duplicative information; and other articles focused on big data
training programs, tutorials, and degree programs as opposed to job market information. During
my next iteration, I am scheduled to seek out and speak with recruiters, hiring managers, and/or
data analysts/scientists, and I am hopeful that they will be able to provide me with more specific
information on social web mining jobs.
Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3
This task also went very well, and my actual time spent was two hours short of my
original estimate. I am a very organized person, and I have years of experience as a copywriter,
so gathering and organizing research is one of my strengths. One thing I have learned to do
specifically for academic writing is to create a text file containing the reference page citations for
my sources ahead of time; this saves a lot of time when putting together the final paper.
In summary, it was difficult cramming all of the research, organization, and copywriting
into the seven-day period I had between my meeting with Dr. Scanlon and the [extended] due
date for my first iteration paper, especially since I am taking an additional class during the
Summer I block. I am glad I asked for a 48-hour extension to submit my paper, as I ended up
needing it. I am emerging from this iteration with a clear understanding and outline for my
SOCIAL MEDIA DATA MINING 27
project. I learned to ask for help as soon as I run into difficulty instead of trying to figure
everything out on my own, and I also learned that I need to work on my knowledge of APA style
and ensure my papers are formatted correctly; it is silly to lose points because of style issues.
Second Iteration – Plan
In Iteration 2, I must locate and interview hiring managers, recruiters, data analysts, and
other appropriate professionals regarding the job market for applicants with social media data
mining skills and question them about what types of jobs are available and which skills are
needed. The scheduled tasks in this iteration are as follows:
Task No. 1: Compose Query Letter/Message Board Query Post
I must compose a query letter to be sent to recruiters, data analysts, and other appropriate
professionals in the big data industry, along with slight variations to be posted on appropriate
LinkedIn Groups and on Quora. I have set aside one hour to complete this task. To complete the
task, I will need a computer with an Internet connection and appropriate time allocation. No
other people will be involved at this stage. My goal is to write a questionnaire that is short
enough to not scare off any potential respondents, yet will gather the information I need.
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts
In this task, I will search my LinkedIn contacts list for IT recruiters and appropriate
professionals in the data analytics field, such as data analysts, and send them the query letter
written during Task 1. I have set aside two hours to complete this task. To complete the task, I
will need a computer with an Internet connection and appropriate time allocation. No other
people will be involved at this stage. My goal is send out as many letters as possible, as it is
likely that most of them will be ignored or deleted as spam.
SOCIAL MEDIA DATA MINING 28
Task No. 3: Post Query to Relevant LinkedIn Groups and on Quora
In this task, I will post my query as a message to relevant LinkedIn groups and as a
question on Quora. I have set aside one hour to complete this task. To complete the task, I will
need a computer with an Internet connection and appropriate time allocation. No other people
will be involved at this stage. My goal is to post the query to LinkedIn groups that are related to
data mining and other appropriate groups, such as groups for Temple and Wilmington University
alumni, so that people who are able to answer the questions have a chance to see them. I will also
post the query as a Quora question in an attempt to reach more respondents.
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed
In this task, I will answer any emails I receive and acknowledge/answer any responses to
my message board posts and Quora question. I have set aside eight hours over the next two
weeks to complete this task. To complete the task, I will need a computer with an Internet
connection and appropriate time allocation. The people involved at this stage will be my
respondents and me. My goal is to interact with my respondents, let them know that I appreciate
their taking the time to answer my questions, and obtaining follow-up information as needed.
Task No. 5: Review Responses & Take Notes
In this task, I will read and review the responses I received to my emails and message
board posts and take notes. I have set aside four hours to complete this task. To complete the
task, I will need a computer with an Internet connection and appropriate time allocation. No
other people will be involved at this stage. My goal is to compare the responses and look for
similarities so that I can determine which skills I need to focus on during the next iteration of this
project.
SOCIAL MEDIA DATA MINING 29
Second Iteration – Action
Task No. 1: Compose Query Letter/Message Board Query Post
I set aside one hour for this task, and that is how long it took to complete. Because I have
a lot of copywriting experience, I am very good at estimating how long a writing assignment will
take to complete. As with any copywriting assignment, most of the time was not spent on
actually writing the letter/post—it was not that long—but on considering how to approach
potential respondents so that they would be encouraged to respond to me.
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts
I set aside two hours for this task, and it took me about an hour and a half to complete. I
have an extensive LinkedIn list with over 500 contacts. I went through the list and searched for
IT recruiters and data professionals, such as data analysts and data scientists. I located
approximately 100 contacts that were either IT recruiters or data professionals, and I sent my
query letter to them.
Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora
I posted my query as a message to the following LinkedIn groups: Big Data and
Analytics; LI Live - Philly; KDNuggets Analytics; Connect: Professional Women’s Network; IT
Pros - Philadelphia; Python Community; Python Data Science & Machine Learning; Temple
University Alumni; Temple University Young Alumni; R Project for Statistical Computing;
Wilmington University; Wilmington University’s College of Technology; and WomenWorking.
I chose these particular groups because they were either related to big data/data science or
information technology specifically or were general professional networking groups that I knew,
from previous experience, had members who were either IT recruiters or data professionals.
SOCIAL MEDIA DATA MINING 30
After I finished posting to the LinkedIn groups, I slightly rewrote the copy for the query
so that I could turn it into a Quora question. I posted the question on Quora and shared the Quora
question on Facebook and Twitter, and as a personal LinkedIn update. I also cut and pasted the
query into a Google Doc so I could easily share it on Facebook, Twitter, and as a LinkedIn status
update, in an attempt to attract respondents who do not participate on Quora or in LinkedIn
groups.
I ended up slightly editing the posts the day after I originally made them. The original
post explained that I needed to interview data science professionals for an action research
project, but did not include the actual questions to be answered. The edit included the questions.
I originally set aside one hour to complete this task, but between the original posts and
the slight edits made to the posts afterward, this task took about two hours, over the course of
two days, to complete.
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed
I set aside eight hours for this task, but it ended up taking about 4.5 hours. This task
primarily consisted of responding to individuals who responded to my personal query letters,
LinkedIn group posts, and Quora question, thanking them for their responses, asking follow-up
questions as appropriate, and, in some cases, answering their questions. I also tweeted my Quora
on three separate days in an effort to attract more respondents.
Task No. 5: Review Responses & Take Notes
I set aside four hours to complete this task, and it ended up taking about that long to
complete. The task consisted of reviewing the numerous responses I received from the LinkedIn
Big Data and Analytics group, one email response, one response through LinkedIn’s messaging
system, and one response to my Quora question, looking for similarities, and taking notes.
SOCIAL MEDIA DATA MINING 31
Second Iteration – Observation
Task No. 1: Compose Query Letter/Message Board Query Post
I estimated that this task would take approximately one hour, and that is how long it
ended up taking to complete. I ended up with two versions, one query letter that I sent out
through email and LinkedIn’s message system, and one for posting to the message boards.
Version 1, which I sent out through email, read as follows:
Subject: Need to interview data professionals for master's research project
I was wondering if you could help me out.
I am finishing up my MS in MIS at Wilmington University. I am currently working on an
action research project for my capstone class. My research subject is social media data
mining. I need to locate hiring managers, recruiters, data analysts, and other appropriate
professionals regarding the job market for applicants with social media data mining skills
and interview them about what types of jobs are available and which skills are needed.
The interviews can be conducted via email; in fact, that would probably be easiest.
Unfortunately, I have no professional contacts in that field. Do you know anyone who
could help me? Even if the individuals recruit/work in data mining in general, and not
social media mining in particular, that would be fine, too.
Please let me know if you could be of assistance. My email is
troth90208@wildcats.wilmu.edu.
Thanks so much!
Teresa Rothaar
SOCIAL MEDIA DATA MINING 32
At first, I used this version of the copy on the message boards as well. However, the day
after making my original post, I edited it to include the actual questions I intended to send to
respondents to my query letter, as follows (this version of the copy was also used on Quora):
I am currently working on an action research project for my capstone class for my master
of science in MIS. My research subject is social media data mining. I need to obtain
information regarding the job market for applicants with social media data mining skills
(or even just data mining skills in general). Specifically:
 What are the minimum skills needed to get a job data mining social media or
performing data mining/data science in general (programming languages,
technologies, math knowledge, etc.)?
 What education is needed?
 What types of jobs are available at the entry level (job titles/brief descriptions)?
 Where are the jobs: which industries?
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts
I set aside two hours for this task, and it took me about an hour and a half to complete.
The task took less time because while I have many LinkedIn contacts, most of them do not work
in data science, information technology, or IT/data science recruiting, and it did not take me as
long to go through the list as I had anticipated. I have over 500 contacts, and I sent letters to
approximately 100 of them. Only two contacts responded to my query. Details of the responses
will be discussed below in the Task No. 5 observation section.
SOCIAL MEDIA DATA MINING 33
Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora
I originally set aside one hour to complete this task, but between the original posts and
the edits made to the posts the following day, this task took about two hours, over the course of
two days, to complete.
I had some difficulty with LinkedIn’s spam filters; LinkedIn was flagging my messages
as being “promotional in nature” and moving them to the “promotions” tab, which is located
outside of the main discussion area. I found that by writing out my email address as “troth90208
-at- wildcats.wilmu.edu” instead of “troth90208@wildcats.wilmu.edu,” LinkedIn allowed the
messages to be posted in the main discussion forum.
As mentioned in my description for my first task, I ended up editing the post to include
the actual questions in the message body as opposed to requesting that respondents emailed me. I
did this because I received three responses from individuals who seemed to think that the
interview would be extensive and take up a lot of the respondents’ time. In response to my
posting in the Temple University Alumni LinkedIn group, [REDACTED] sent me a personal
message stating, “The people in this field are very busy. These things take time. Perhaps you
could attend some conferences this summer and get to know some people who would be willing
to be interviewed” (personal communication, June 23, 2015). In the R Project for Statistical
Computing Group, [REDACTED] suggested that I “create an online survey using
SurveyMonkey or something similar, then post the link to LinkedIn,” and [REDACTED] told
me, “You can obtain the information you are looking for simply by analyzing job openings.”
Even after I edited my posts to include the questions, none of my group threads attracted
any responses except for the thread in the Big Data and Analytics Group, which immediately
became very active. I received responses from 12 members, and I engaged in interactive, back-
SOCIAL MEDIA DATA MINING 34
and-forth discussions with them. Additionally, two individuals who are new to the data science
field, as I am, indicated that they had followed the thread because they, too, wanted to hear the
answers to these questions.
In addition to the LinkedIn groups, I posted my query as a question on Quora. I received
only two responses; one was a junk response from an Internet troll, and the other was a serious
response from an IT professional.
Details of the responses will be discussed below in the Task No. 5 observation section.
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed
This task took only 4.5 hours, even though I had set aside eight hours. Admittedly, I was
not certain how long it was going to take, and in cases when I am unsure of a job’s duration, I
tend to estimate high. This comes from my experience as a freelancer; I have found that clients
would rather have me say that a job will take three days and I get it to them in two, as opposed to
the other way around.
I posted the Quora question on Facebook, Twitter, and as a LinkedIn status update. While
three people shared my Facebook post, and about a dozen people re-tweeted me on Twitter, I
received only one valid response to my Quora question. I suspect that the poor response rate was
due to my audience. The post in the Big Data and Analytics LinkedIn group reached a large
community of data professionals who are eager to share knowledge, while my social media posts
largely reached people who do not work in the field.
I made sure to thank every person who responded on LinkedIn for taking the time to do
so. One respondent indicated that he felt he was making “too many posts” (Oldfield, 2015b), and
I assured him that I was grateful that he was willing to share so much information with me. I also
responded to the individuals who reacted negatively on the Temple University Alumni and R
SOCIAL MEDIA DATA MINING 35
Project groups. I explained to them that I needed to conduct personal interviews, and not just
Internet research, for an action research project, that the interviews would take no more than 10
to 15 minutes, and that the project was time-sensitive; I could not wait weeks or months for
responses. My explanations did not work—I did not receive any responses in those groups—but
I felt it was important to maintain a professional image.
I also interacted with the participants on the Big Data and Analytics group thread, asking
follow-up questions. In particular, when I received responses that contained very long lists of
required programming languages and technologies, I asked if there existed any truly entry-level
positions for candidates with an IT education but no work experience in the IT field.
Task No. 5: Review Responses & Take Notes
This task took about four hours to complete, which matched my original time estimate. I
read each response thoroughly and took notes, looking for similarities. I received responses from
15 different individuals. Some of the participants on the LinkedIn thread posted to the thread two
or three times. Most of the responses addressed data science jobs in general, not social media
data mining specifically. Each individual’s response is summarized below.
T. Kielinski, the founder and CEO of IT Pros, a Philadelphia IT staffing firm, responded
to me using LinkedIn’s messaging system. In response to my question about the minimum skills
needed to get an entry-level job, he simply listed a number of “free and/or open source tools for
data mining applications,” specifically, RStudio, Tinn-R, Weka, RapidMiner, KNIME, the
Mahout machine learning library, Rattle, CLUTO, fastcluster, arules, ARMiner, TraMineR,
Gephi, Pajek, CFinder, ProM, GeoDa, and CLAVIN. He said that the required education “ranges
from formal training to a master’s degree” and that I could obtain sample job titles and job
descriptions by doing a search on Indeed.com and using the keywords “data mining.” According
SOCIAL MEDIA DATA MINING 36
to T. Kielinski, the industries hiring data scientists are “Internet, staffing and recruitment,
ecommerce, and health care” (personal communication, June 23, 2015).
B. Benson, a data analytics professional, spoke with me via email (personal
communication, June 23, 2015). He began by writing about the confusion he has encountered
regarding the difference between communications personnel and actual data analysts:
In my experience, there is little understanding of the difference between a digital analyst
and a digital communications director. Much of the conflation occurs between the
communications role of drafting text and the technology role of managing the messaging
tools. Most professional digital directors within my sector, progressive political
campaigns, wear both hats well. Fortunately, as the industry matures, there is better
segmentation between analysis and communications, leading to more specialized roles
for analysts. (personal communication, June 23, 2015)
He wrote that the required education “is generally a bachelor's with some sort of focus in
analysis or messaging” (personal communication, June 23, 2015), and that when he is hiring an
analyst to perform data mining tasks, he looks for “strong statistical skills” and further adds:
Many people with a biotech background have a lot of experience using R, SPSS, or Stata,
though almost everyone ends up using R. Familiarity with relational databases is
important as well, with some variation of SQL being standard. Data visualization is
extremely helpful, and can be generated from most any application. Specialization in
Tableau is attractive. GIS is an added bonus, because higher ups love maps with an
illogical fervor. (personal communication, June 23, 2015)
In Mr. Benson’s sector—politics—the most common-entry-level positions are for digital
directors, digital strategists, or social media managers (personal communication, June 23, 2015).
SOCIAL MEDIA DATA MINING 37
D. Herrera is a software developer who responded to my Quora question. She said that
the required education “is relative to what you're doing. I have a BS in animal science, but I have
learned and taught seven different programming languages and hold several certifications. It's
the languages and understanding of the languages that matter” (Herrera, 2015). She pointed out
that to mine social media, knowledge of the API of the network(s) you wish to mine is most
important; the specific programming language is less important because “most languages can
utilize APIs, so it's a pick your poison situation. Different people will have different
recommendations” (Herrera, 2015). Because I received so many responses, I found the last
statement to be entirely true, although the four primary languages in the data science world are
R, Python, SAS, and SQL (Piatetsky, 2014, para. 3). D. Herrera stated that typical entry-level job
titles would include DBA, solution developer, or solution analyst, but noted that the exact title
would “depend more on the company you look for rather than the developer titles” (Herrera,
2015). She said that she suspected that most industries hiring social media data scientists would
be in the B2C (business to consumer) category, but that B2B (business to business) companies
might be interested in mining LinkedIn (Herrera, 2015).
B. Mathews, an IT professional, was the first participant in the thread I created in the
LinkedIn Big Data and Analytics Group. He stated that an entry-level job applicant would need
to know languages and tools such as R, Pig, Java, and Hadoop, and have a strong math
background, to include probability and statistics. Applicants also need a bachelor’s degree,
“curiosity, some knowledge of business, [and] perhaps computer sciences” (Mathews, 2015).
According to this respondent, typical entry-level jobs would include business analyst or data
analyst, and typical industries would be “oil and gas, financial institutions, genome research,
manufacturing, automotive, [and] government” (Mathews, 2015).
SOCIAL MEDIA DATA MINING 38
A. Burris, a data scientist, stated that entry-level applicants should have a bachelor’s
degree in business administration or finance, advanced knowledge of statistics, and knowledge of
SQL, Minitab, Tableau, and “Microsoft/Apple/Google products for analysis, interpretation, [and]
presentation of results” (Burris, 2015). He further stated that typical industries looking for data
science applicants include “consumer goods and retail, energy and utilities, healthcare,
information services, and travel and industry” and that a business analyst job would be a typical
entry-level role (Burris, 2015).
R. Taneja, an IT professional, stated that the most popular languages and technologies are
R, SAS, Python, Pig, Hive, MATLAB, Scala, Hadoop, and NoSQL, along with communication
skills, business acumen, and a curiosity about data and, similar to what B. Mathews stated, a
curiosity about data and information (Taneja, 2015). She said that educational requirements
were, at minimum, a bachelor’s degree in a STEM field, and that she has often seen employers
require applicants to have a Ph.D. for data scientist roles (Taneja, 2015). She noted that data
scientist is the most popular job title she has encountered, but that data engineer “is becoming
more popular as organizations look for more wide skillsets” (Taneja, 2015). She stated that
typical industries include healthcare, finance, genome research, automotive, manufacturing, and
digital and mobile industries (Taneja, 2015).
C. Mullins, a DBA and big data consultant, entered the conversation by noting that
although newer data science technologies, such as Hadoop and Pig, are important, many
organizations are still making use of older technologies:
[S]everal recent surveys indicate that a lot of big data/analytics projects are using pre-
existing technologies like SQL, relational databases, transaction data, and even
spreadsheets. In fact, these are all at the top of the list. Now this might be because big
SOCIAL MEDIA DATA MINING 39
data analytics projects and technologies (e.g., Hadoop, R, Pig, NoSQL, etc.) are newer
and not as pervasive yet among data professionals. But I would not discount the tried and
true (alongside the new). (Mullins, 2015a)
Later in the thread, he noted that there are many potential jobs in the big data market:
There are DBAs (who need to understand in-depth details of the operations of the DBMS
software being used; there may be many different database systems in use); the data
scientist (who needs to understand how to develop models to pull information from the
databases available, as well as to provide guidance on what additional data may be
needed to achieve the required goal); subject matter experts (who understand the business
and its data at a detailed level to provide guidance to data scientists about the specifics of
the business data); developers (who can write code in various languages expertly); ETL
(to move data in and out of databases); and probably more that I am missing. Note, too,
that there will (or should) be overlap between the skillsets of these individual workers.
(Mullins, 2015b)
U. Shah, a big data analyst and IT architect, concurred with C. Mullins regarding the
importance of SQL, and added that “R is great for data analysis; you can even use simple SQL
inside R using packages” (Shah, 2015). He stated that Python is good for writing programs. He
suggested that, to get started, I should “create a free account on Amazon AWS and play with
what you have learned or experiment with different datasets. Cleaning data, transforming data
and visualizing data can teach you many things” (Shah, 2015).
S. Shaw, an IT professional, said that he felt the other respondents quoted above
described the market very well. He added that the “top talent” he works with “have skills that
cross disciplines” (Shaw, 2015). He emphasized the need for a solid foundation in mathematics,
SOCIAL MEDIA DATA MINING 40
along with technologies such as HDFS (Hadoop Distributed File System), Java, and scripting
languages (Shaw, 2015).
R. Del Rosario, an IT professional, emphasized the importance of defining what type of
data you wish to mine from social media, be it sentiment analysis, product name mentions, or
geographic location based on sentiment analysis or some search term, before diving in, so as to
“get specific data and avoid data overload” (Del Rosario, 2015a). In response to the comments
stressing the importance of SQL, R. Del Rosario stated that SQL “will not work with
unstructured data … Most data in social media is unstructured” (Del Rosario, 2015a). Later in
the thread, in response to a post where I asked about the feasibility of an applicant like me, with
no experience, entering the field, he assured me:
[S]trictly speaking, there are only two major groups of skill when it comes to Big Data.
The first major group are those who self-learned this technology or those who contributed
to the growth of Big Data, and those who have started to adapt to the Big Data
technologies. So the good news is, technically, there are lots of "entry-levels" and there
are lots of levels or categories where you can get in the Big Data bandwagon. (Del
Rosario, 2015b)
He suggested that I take online courses such as those offered by Big Data University,
which he said is free, and that I would be ready for a job “with about six months of intensive
learning” (Del Rosario, 2015b).
C. Gilbert, an IT recruiter, discussed how social media mining is only one subset of the
data science field:
Social media mining should be just one part of the analysis that a data scientist/engineer
/analyst does. The social media needs to be linked to something and combined with other
SOCIAL MEDIA DATA MINING 41
data and information. It is generally accepted that 80% of the time spent on in depth data
analysis (data science/data mining) is actually spent on data preparation. … For social
media specifically I would expect people to have skills with natural language processing
using deep learning techniques. Python has libraries for this. And so, it is not just Python
skills that are needed but skills in specific types of techniques or libraries with those
languages. Data matching with and without (using tools or packages) is also useful for
linking your sentiment to specific customers. … The capacity to collect and process more
data is continuing to lead to more analysis, and social media, in many cases, is just one
aspect of analyzing an increasingly big picture. (Gilbert, 2015)
He also noted that good storytelling and presentation skills are important, as the
information that is mined will have to be presented and explained to other people, and that it is
helpful for a job applicant to have specific knowledge of whatever industry (telecom, finance,
etc.) he or she wishes to work in (Gilbert, 2015).
K. Lawlor, a data analyst student, said that he concurred with most of what C. Gilbert
said, particularly the importance of industry-specific knowledge: “I have seen many articles, or
projects, which make findings/recommendations which are inaccurate, as they have not
understood the domain in which they are working” (Lawlor, 2015). He also stated that, in his
experience, many employers do not fully understand what they are looking for when they are
seeking to hire a data professional: “I have seen recruiters [post] job adverts with 'data scientist'
as a heading and a description of a database administrator, or something similar” (Lawlor, 2015).
I had several exchanges with T. Oldfield over the course of two days. Mr. Oldfield is a
software architect who recruits developers to write big data analytics software; the talent he
recruits has at least two to three years of development experience and not only “knowledge of
SOCIAL MEDIA DATA MINING 42
‘data science’ but also the knowledge to realize that dream within computer code” (Oldfield,
2015a). He admitted that his requirements are very stringent and that “[his] expectations are
difficult for most people” (Oldfield, 2015a). Specifically, the applicants he recruits must be
expert coders with a deep understanding not only of coding, but operating systems concepts such
as threading and processing, data storage and security, and data types (Oldfield, 2015a). He
elaborated:
[T]he traditional taught methods of coding are completely useless in this field. This
comes down to someone that can think outside the box and fully understands how
memory management works, preferably with some knowledge of L1/2/3 caching, data
bandwidths on the FSB, threading, locking etc. - microcode implementation, and
hardware design implications. (Oldfield, 2015a)
Because he does not hire entry-level applicants, he at first expressed concern that his
answers would not be relevant to my project (Oldfield, 2015a). However, he also admitted that
most applicants are unable to meet his requirements exactly, and that the most important thing to
him when looking to hire someone is the applicant’s mindset; specifically, whether they can
think critically and ask questions:
[Y]ou need a certain mind set: open mind, ask, ability to turn things on their head and ask
the question differently. Hence my comments "what you were taught about computer
code is wrong for this field." The fact that you asking - and keep asking what do we mean
is a good start. Second - start at the bottom - what part interests you/you have trained in,
start here and branch out as you gain more experience. I have also note that many of the
skill I want to employ are about as likely as "pigs flying", so no worries - if the subject
interests you, you have the right mind set - then you will be good. (Oldfield, 2015b)
SOCIAL MEDIA DATA MINING 43
He explained that this is because the data science field is still wide-open, and many
discoveries are yet to be made. People working in data science cannot depend on pre-defined
procedures; they must create their own:
Even though it has been going for years, it is still "research" - there are no pre-defined
procedures to get the answer you want - you have to think of the solution to THIS
problem for nearly every new problem. Not always of course - tools are improving.
(Oldfield, 2015b)
Further to my exchange with T. Oldfield, K. Jones stated that he has been “in the
reporting, big data space & Hadoop space for many years” (Jones, 2015a). His comments
elaborated on the confusion that exists in what is a relatively new industry: “There are many
competing ideas and approaches in data science. Most Fortune 500 firms can't figure out a
decisive direction fearing they'll fail with their investments” (Jones, 2015b). He mentioned that
he feels too many Silicon Valley companies are entering the data science field without knowing
what they are doing, and that the “vast majority of them will end up in the graveyard if they are
not fortunate enough to be bought out by a larger, well financed firm” (Jones, 2015b).
A. Gonzalez, an IT professional, was the final participant in the thread. I had expressed
concern about learning to use all of the many tools the previous respondents mentioned, on my
own, from home, and he responded with the following:
In my opinion, I think there is a difference between what skills companies are requiring
for hiring people working in Big Data and what they will probably need. Not only the
companies but people have the same doubts; you can see this discussion also along the
previous posts.
SOCIAL MEDIA DATA MINING 44
For some people or companies the required skills are based on the tools or the languages,
they talk about Hadoop, Spark, SQL, OpenCync, Python. In my opinion, those are just
nice programming skills, but those are not the skills needed in an area where the future is
so uncertain, where the technology can and will swift so fast. (Gonzalez, 2015)
This was similar to T. Oldfield’s comment about how there is often a gap between the
skills he wants his applicants to have and the skills his applicants actually have.
Second Iteration – Reflection
Task No. 1: Compose Query Letter/Message Board Query Post
This task was a basic copywriting assignment; I have written many letters such as these
over the years, and I did not have much trouble composing this one. Because the letter was
meant to prompt an action from the receiver—in this case, getting the receiver to agree to an
interview—I approached it the way I would a sales letter. This is why I began the letter by
stating, “I was wondering if you could help me out.” I was taught to open sales pitches in this
manner in a sales class I took as part of my MBA. The logic is that while everyone likes to buy
things, no one likes to be sold something, and asking for a prospect’s help is a much better way
of opening a sales letter than launching directly into your pitch.
I used my Wilmington University email address in the letter, as opposed to one of my
non-university addresses, so that the recipients would see that I really was a graduate student,
and not someone trying to build a list of targets to spam.
Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts
As mentioned previously, I sent out approximately 100 query letters and received only
two responses. Part of the problem is that I do not personally know most of the people I am
connected with on LinkedIn, and the people I do know do not work in information technology;
SOCIAL MEDIA DATA MINING 45
most of them are attorneys or work in marketing and sales. I was, in effect, engaging in cold
calling, and cold calling is a numbers game. I knew that I would have to send out as many letters
as possible to get a very small number of responses.
However, in hindsight, I believe my biggest mistake was to not include the actual
questions in my original query letter. I discovered this mistake when I began posting my query to
message boards, and I received responses from individuals who seemed to think that this was
going to be an extensive interview that would take a lot of time to complete. Had the individuals
I sent the letter to seen that they needed to answer only four questions, my response rate might
have been higher.
Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora
I posted my query to 13 different groups, but the only valid responses I received were
from the Big Data and Analytics group. As I mentioned previously, I did receive responses in
two other groups, but they were negative, with one individual chiding me that it would take a lot
of time to find respondents, another asking why I could not simply Google to obtain the
information I wanted, and another person telling me that I should build an online market research
questionnaire. It was good that I posted to so many groups, even though most of them returned
no results; like the query letter, this task was about numbers. I needed to get my questions in
front of as many potential respondents as possible.
I was surprised that the Quora question received only one valid answer. There is a very
large data science community on Quora; the Data Science topic has over 56,500 followers
(Quora, n.d.a), and the Data Mining Topic has over 65,000 followers (Quora, n.d.b). By its
nature—Quora is a question-and-answer site—Quora attracts people who love to answer
questions and share their knowledge. Each time I tweeted my Quora question, I received
SOCIAL MEDIA DATA MINING 46
numerous re-tweets, so I am certain the question got sufficient exposure. I am stumped as to why
I received only one valid answer, and it was from a Facebook friend.
Task No. 4: Answer Respondents and Promote Query on Social Media as Needed
I do not feel I could have done a better job promoting my query on social media; as I
mentioned above, my Quora question was retweeted by numerous users, some of whom also
followed me on Twitter. I used appropriate hash tags, such as #DataMining, so that my Tweet
could easily be found.
I had a great back-and-forth on the LinkedIn group, though looking back at my posts, my
dismay at the long list of requirements some respondents gave me, and how long it would take to
train myself on all of these things, was evident. Although I do not think I turned anyone off—the
thread was very active—I feel I could have been less emotional.
Task No. 5: Review Responses & Take Notes
I did not expect to get so many responses; it was a lot of information to take in. It was
overwhelming and, at first, frustrating. I was especially dismayed by one respondent’s comment
about me being six months out from being able to get an entry-level job. I do not have six
months to find a job, and I stated this in a response. As I discussed in the observation section,
this prompted some comments about how what employers would like to have and what they can
actually expect to get are often two different things, especially since this field is so new.
My dismay was because I am used to competing in a very saturated job market with little
demand and a nearly unlimited supply. I have spent a number of years working as a copywriter.
The competition in this field is intense; there are dozens, if not hundreds of applicants for every
open position, and applicants who do not meet 100% of the qualifications stated in an ad are
wasting their time applying. However, in the data science world, there is a shortage of applicants
SOCIAL MEDIA DATA MINING 47
(Violino, 2014, para. 1). I am not facing as much competition, and potential employers have to
be more flexible with their requirements. I need to adjust my thinking and not become
discouraged so easily.
I also need to network better. The biggest problem I had when I began this iteration is
that I have no real professional network. Most of the professionals I know are either attorneys or
work in marketing and sales, not data science, not even information technology. Spending more
time in relevant LinkedIn groups, especially Big Data and Analytics, would help me get to know
other data professionals and build my network. Who I know might be more important than what
I know.
Third Iteration – Plan
During this iteration, I will determine the specific skill sets needed to perform social
media data mining, then go through a tutorial to help determine which skills I already have and
which skills I need to obtain. The scheduled tasks in this iteration are as follows:
Task No. 1: Perform Research on Skill Sets Neededfor Social Media Data Mining
In this task, I will perform Internet research on the specific skill sets, including
knowledge of Python, that are needed for social media data mining. I have set aside six hours to
perform this task. To complete this task, I will need a computer with an Internet connection and
appropriate time allocation. No other people will be involved at this stage. My goals are to
determine which specific skill sets I need for social media data mining and what tutorials, if any,
exist on the Internet. I already own a book on mining social media, and it includes tutorials, but I
want to make sure this book is the best place for me to start. Do I have the requisite skills for this
book? Are there better tutorials online?
SOCIAL MEDIA DATA MINING 48
Task No. 2: Review Materials & Select a Tutorial to Work Through
In this task, I will review all of the materials I collected in the first task and select a
tutorial to work through. The expected duration is two hours. To complete this task, I will need a
computer with an Internet connection and appropriate time allocation. No other people will be
involved at this stage. My goal is to select a specific tutorial so that I can prepare my computer
and get to work.
Task No. 3: Install Python & Any Other Required Software on my Computer
In this task, I will prepare my computer for the tutorial I selected during the previous task
by installing Python and any other required software. The expected duration is four hours. To
complete this task, I will need a computer with an Internet connection and appropriate time
allocation. No other people will be involved at this stage. My goal is to get my machine ready for
my selected tutorial.
Task No. 4: Work Through the SelectedTutorial
In this task, I will work through my selected tutorial. Hopefully, I can finish the tutorial
before the end of this iteration; if I cannot, I will go as far as I can. To complete this task, I will
need a computer with an Internet connection, appropriate time allocation, and my tutorial
materials. No other people will be involved at this stage. I plan to spend at least eight hours on
this task. I have two goals for this task: to learn as much as I can and to determine which gaps
exist in my knowledge. I feel that working through a tutorial is the best way to accomplish these
goals because computer science is much like mathematics in that you do not learn by reading,
but by solving problems.
SOCIAL MEDIA DATA MINING 49
Third Iteration – Action
Task No. 1: Perform Research on Skill Sets Neededfor Social Media Data Mining
I set aside six hours for this task, and my time estimate was accurate. I began by
performing a Google search on the phrases “social media data mining” and “social media data
mining getting started.” I followed approximately 20 links. Then, I performed a Google search
on the phrases “how to learn Python” and “Python data analysis.” I followed approximately 15
links. Additionally, I reviewed five links that I had previously stored as bookmarks when I came
across them during the two prior iterations, and I reviewed five links I found in email newsletters
that I received from DataScienceCentral.com and AnalyticsVidyha.com.
Task No. 2: Review Materials & Select a Tutorial to Work Through
I estimated that I would need two hours to complete this task, but it took me three hours
to complete. I reviewed all of the links I had bookmarked, re-reviewed the interviews I
conducted in the last iteration, and brainstormed a game plan for the remainder of the iteration. I
decided to use the book Mining the Social Web by Matthew A. Russell, which includes not only
a text, but a GitHub repository of source code and a comprehensive set of tutorials to be
completed using IPython Notebook.
Task No. 3: Install Python & Any Other Required Software on my Computer
I estimated I would need only four hours to complete this task, but it ended up taking me
8.5 hours. First, I installed Python 2.7 and three other software packages that I needed for it to
work on my machine (MacPorts, XCode, and Tkinter). Then, I attempted to install the “virtual
machine” that accompanied my social media mining book; however, I was unable to get the
install to complete. After attempting some fixes and performing some additional research, I
decided to install IPython Notebook on its own so that I could work the tutorial, and I ended up
SOCIAL MEDIA DATA MINING 50
installing Anaconda, a Python distribution that includes, among other packages, IPython
Notebook.
Task No. 4: Work Through the SelectedTutorial
I set aside at least eight hours for this task, and I ended up spending 11 hours on it. I
finished the tutorial in chapter 1 of my book, then skipped ahead to chapter 9 and partially
worked through that tutorial.
Third Iteration – Observation
Task No. 1: Perform Research on Skill Sets Neededfor Social Media Data Mining
I spent six hours on this task, which was the time I had allotted to it. I researched both
social media data mining skills and Python for data analysis. In addition to performing Google
searches, I also reviewed some articles I received in email newsletters from the Data Science
Central and Analytics Vidyha websites that appeared to be pertinent to this task, either because
they discussed getting started in the data field or learning Python for data analysis. For each
Google search, I used multiple keywords to filter the results so as to see the most relevant
articles. I began by performing a Google search on the phrases “social media data mining” and
“social media data mining getting started.”
Similar to the interviews I conducted during the second iteration, most of the material I
came across talked about data mining in general, not social media mining in particular. For
example, Piatetsky (2013) outlined seven steps to learn data mining and data science: learn
Python, R, and SQL; learn to use data mining and visualization tools such as KNIME, Rapid
Miner, R Graphics, and Tableau; read textbooks on data mining and data science; educate
yourself using webinars, online courses, or perhaps a data science degree; get to work with some
SOCIAL MEDIA DATA MINING 51
sample data sets; participate in competitions on Kaggle.com; and network with other data
scientists.
Works (2014) noted that 88% of data professionals have at least a master’s degree, with
the most common majors being mathematics and statistics, computer science, and engineering
(“Technical Skills: Analytics”). In addition to proficiency with SAS and R (“Technical Skills:
Analytics”), Works recommended that applicants in this field have strong Python coding skills,
along with proficiency in Hadoop and SQL, and the ability to work with unstructured data
(“Technical Skills: Computer Science”).
Marr (2015b) wrote of the importance of the Python and R programming languages and
pointed out that, when it comes to newer technologies such as Hadoop, Hive, and Pig, many data
professionals are self-taught:
A working knowledge of Python or R–two of the programming languages most
commonly used for analyzing large digital datasets, is also usually expected. The biggest
challenge can be finding candidates with experience in the most cutting edge analytics
applications, such as those involving machine learning. Many people will not have the
opportunity to learn this at school, and experts are often self-taught. (p. 2, para. 5)
For both search terms, the book Mining the social web: Data mining Facebook, Twitter,
Linkedin, Google+, Github, and more by Russell (2014) ranked high in the Google search
results; the Amazon listing for the book was the first result on the first page for the search term
“social media data mining” and the fourth result on the first page for the search term “social
media data mining getting started.” Other references to the book, such as the corresponding
GitHub repository and the author’s website (miningthesocialweb.com) also ranked high, on the
first or second page for each search phrase, along with an interview of the author by De
SOCIAL MEDIA DATA MINING 52
Lacvivier (2013), where he recommended Python as a starter language for social media mining
because, in his opinion, it works well with the JSON data format, which he noted is used by
many social media networks:
Russell advocates using Python for first social data mining projects because its syntax is
simple and its data structure is compatible with textual data. "Most social media
properties are going to return data to you in JSON format," Russell explained. JSON
(JavaScript Object Notation) is a flexible and intuitive text-based data format often used
in Web environments in order to communicate both simple and complex data structures
over a network. "Python's core data structures are so close to JSON that there's no real
penalty for working with that data. It's very easy to make that request." (para. 3)
Next, I performed Google searches on the phrases “how to learn Python” and “Python
data analysis,” reviewed several links that I had previously stored as bookmarks when I came
across them during the two prior iterations, and reviewed five links contained in email
newsletters from DataScienceCentral.com and AnalyticsVidyha.com. Between these three
sources, I followed approximately 25 links. Most of them suggested various tutorials, books, and
online classes; there was a lot of duplicative information, and at least half were little more than
thinly veiled advertisements for paid products (books or courses).
Analytics Vidhya (n.d.) put together a “comprehensive learning path” for learning how to
use Python for data science, which includes instructions on how to set up your machine (the site
recommends using the Anaconda distribution) and links to numerous Python tutorials and online
courses that cover everything from the basics of the language to data visualization and machine
learning.
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining
Social Media Data Mining

More Related Content

What's hot

MarvelSoft Library Management Software Guide
MarvelSoft Library Management Software GuideMarvelSoft Library Management Software Guide
MarvelSoft Library Management Software GuideRanganath Shivaram
 
WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...
WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...
WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...iasaglobal
 
Visualforce Workbook
Visualforce WorkbookVisualforce Workbook
Visualforce WorkbookSLMaster
 
Data analytics in education domain
Data analytics in education domainData analytics in education domain
Data analytics in education domainRishi Raj
 
Edgar morin seven complex learnin in education for the future
Edgar morin seven complex learnin in education for the futureEdgar morin seven complex learnin in education for the future
Edgar morin seven complex learnin in education for the futureaurelia garcia
 
The Honohan Report
The Honohan ReportThe Honohan Report
The Honohan ReportExSite
 
Bucu005 hivaids and drug abuse
Bucu005 hivaids and drug abuseBucu005 hivaids and drug abuse
Bucu005 hivaids and drug abusePaul Muthuri
 
Win Over Stress in Work & Life - Study Notes
Win Over Stress in Work & Life - Study NotesWin Over Stress in Work & Life - Study Notes
Win Over Stress in Work & Life - Study NotesMarius FAILLOT DEVARRE
 
Subcontracting configuration
Subcontracting configurationSubcontracting configuration
Subcontracting configurationRamesh Kamishetty
 
Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...
Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...
Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...Himanshu Shrivastava
 
I think manual final electronic
I think manual final electronicI think manual final electronic
I think manual final electronicCyril Sualan
 
FFSMIS User Guide
FFSMIS User GuideFFSMIS User Guide
FFSMIS User GuideFFSP WFL
 
Complete Thesis Draft 2.2compress
Complete Thesis   Draft 2.2compressComplete Thesis   Draft 2.2compress
Complete Thesis Draft 2.2compressMusstanser Tinauli
 
Emerging ed tech free_education_technology_resources_ebook
Emerging ed tech free_education_technology_resources_ebookEmerging ed tech free_education_technology_resources_ebook
Emerging ed tech free_education_technology_resources_ebookaurelia garcia
 

What's hot (19)

MarvelSoft Library Management Software Guide
MarvelSoft Library Management Software GuideMarvelSoft Library Management Software Guide
MarvelSoft Library Management Software Guide
 
WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...
WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...
WHAT CONSTITUTES AN AGILE ORGANIZATION? ? DESCRIPTIVE RESULTS OF AN EMPIRICAL...
 
Rand rr2637
Rand rr2637Rand rr2637
Rand rr2637
 
Visualforce Workbook
Visualforce WorkbookVisualforce Workbook
Visualforce Workbook
 
Data analytics in education domain
Data analytics in education domainData analytics in education domain
Data analytics in education domain
 
Outlook
OutlookOutlook
Outlook
 
Edgar morin seven complex learnin in education for the future
Edgar morin seven complex learnin in education for the futureEdgar morin seven complex learnin in education for the future
Edgar morin seven complex learnin in education for the future
 
The Honohan Report
The Honohan ReportThe Honohan Report
The Honohan Report
 
Bucu005 hivaids and drug abuse
Bucu005 hivaids and drug abuseBucu005 hivaids and drug abuse
Bucu005 hivaids and drug abuse
 
Fimmda
FimmdaFimmda
Fimmda
 
Win Over Stress in Work & Life - Study Notes
Win Over Stress in Work & Life - Study NotesWin Over Stress in Work & Life - Study Notes
Win Over Stress in Work & Life - Study Notes
 
YieldCos in the U.S. Final AN
YieldCos in the U.S. Final ANYieldCos in the U.S. Final AN
YieldCos in the U.S. Final AN
 
Subcontracting configuration
Subcontracting configurationSubcontracting configuration
Subcontracting configuration
 
Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...
Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...
Psychosocial Risk Factor in Call Centres: Analysing Work Design and Well-Bein...
 
I think manual final electronic
I think manual final electronicI think manual final electronic
I think manual final electronic
 
FFSMIS User Guide
FFSMIS User GuideFFSMIS User Guide
FFSMIS User Guide
 
Complete Thesis Draft 2.2compress
Complete Thesis   Draft 2.2compressComplete Thesis   Draft 2.2compress
Complete Thesis Draft 2.2compress
 
HCI-Final-Document
HCI-Final-DocumentHCI-Final-Document
HCI-Final-Document
 
Emerging ed tech free_education_technology_resources_ebook
Emerging ed tech free_education_technology_resources_ebookEmerging ed tech free_education_technology_resources_ebook
Emerging ed tech free_education_technology_resources_ebook
 

Viewers also liked

Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social mediarangesharp
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Mediahome
 
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Hakka Labs
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingAhmed Banafa
 
Developing a LinkedIn Recruiting Strategy
Developing a LinkedIn Recruiting StrategyDeveloping a LinkedIn Recruiting Strategy
Developing a LinkedIn Recruiting Strategyaparton
 
Social recruiting seminar maximizing linked in and facebook for recruiting ...
Social recruiting seminar   maximizing linked in and facebook for recruiting ...Social recruiting seminar   maximizing linked in and facebook for recruiting ...
Social recruiting seminar maximizing linked in and facebook for recruiting ...HireClix
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsPeter Skomoroch
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
 
تقرير عن شبكات التواصل الاجتماعية
تقرير عن شبكات التواصل الاجتماعيةتقرير عن شبكات التواصل الاجتماعية
تقرير عن شبكات التواصل الاجتماعيةssaa1430
 
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall SponderLinkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall SponderMarshall Sponder
 
الملخص الرقمي - الربع الثالث 2014
الملخص الرقمي - الربع الثالث 2014الملخص الرقمي - الربع الثالث 2014
الملخص الرقمي - الربع الثالث 2014MOTC Qatar
 
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 TutorialBusiness Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 TutorialQiang Zhu
 
تحليل المعلومات في الشبكات الإجتماعية
تحليل المعلومات في الشبكات الإجتماعيةتحليل المعلومات في الشبكات الإجتماعية
تحليل المعلومات في الشبكات الإجتماعيةRiyadh Geeks
 
LinkedIn Demographics & Statistics - Jan 2012
LinkedIn Demographics & Statistics - Jan 2012LinkedIn Demographics & Statistics - Jan 2012
LinkedIn Demographics & Statistics - Jan 2012Amodiovalerio Verde
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDatamining Tools
 
Data Management Project for Car Dealership
Data Management Project for Car DealershipData Management Project for Car Dealership
Data Management Project for Car DealershipTeresa Rothaar
 
LinkedIn Strategies for Recruiting: A Case Study
LinkedIn Strategies for Recruiting: A Case StudyLinkedIn Strategies for Recruiting: A Case Study
LinkedIn Strategies for Recruiting: A Case StudyKara Yarnot
 

Viewers also liked (20)

Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data mining for social media
Data mining for social mediaData mining for social media
Data mining for social media
 
Data mining in social network
Data mining in social networkData mining in social network
Data mining in social network
 
Data mining on Social Media
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
 
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips. Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
 
LinkedIn for B2B
LinkedIn for B2BLinkedIn for B2B
LinkedIn for B2B
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Developing a LinkedIn Recruiting Strategy
Developing a LinkedIn Recruiting StrategyDeveloping a LinkedIn Recruiting Strategy
Developing a LinkedIn Recruiting Strategy
 
Social recruiting seminar maximizing linked in and facebook for recruiting ...
Social recruiting seminar   maximizing linked in and facebook for recruiting ...Social recruiting seminar   maximizing linked in and facebook for recruiting ...
Social recruiting seminar maximizing linked in and facebook for recruiting ...
 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
تقرير عن شبكات التواصل الاجتماعية
تقرير عن شبكات التواصل الاجتماعيةتقرير عن شبكات التواصل الاجتماعية
تقرير عن شبكات التواصل الاجتماعية
 
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall SponderLinkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
 
الملخص الرقمي - الربع الثالث 2014
الملخص الرقمي - الربع الثالث 2014الملخص الرقمي - الربع الثالث 2014
الملخص الرقمي - الربع الثالث 2014
 
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 TutorialBusiness Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
 
تحليل المعلومات في الشبكات الإجتماعية
تحليل المعلومات في الشبكات الإجتماعيةتحليل المعلومات في الشبكات الإجتماعية
تحليل المعلومات في الشبكات الإجتماعية
 
LinkedIn Demographics & Statistics - Jan 2012
LinkedIn Demographics & Statistics - Jan 2012LinkedIn Demographics & Statistics - Jan 2012
LinkedIn Demographics & Statistics - Jan 2012
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data Management Project for Car Dealership
Data Management Project for Car DealershipData Management Project for Car Dealership
Data Management Project for Car Dealership
 
LinkedIn Strategies for Recruiting: A Case Study
LinkedIn Strategies for Recruiting: A Case StudyLinkedIn Strategies for Recruiting: A Case Study
LinkedIn Strategies for Recruiting: A Case Study
 

Similar to Social Media Data Mining

Jobeet 1.4-doctrine-en
Jobeet 1.4-doctrine-enJobeet 1.4-doctrine-en
Jobeet 1.4-doctrine-enModu Labs LLC
 
Ibm tivoli usage accounting manager v7.1 handbook sg247404
Ibm tivoli usage accounting manager v7.1 handbook sg247404Ibm tivoli usage accounting manager v7.1 handbook sg247404
Ibm tivoli usage accounting manager v7.1 handbook sg247404Banking at Ho Chi Minh city
 
Capturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender SystemsCapturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender SystemsMegaVjohnson
 
Web2.0 And Business Schools Dawn Henderson
Web2.0 And Business Schools   Dawn HendersonWeb2.0 And Business Schools   Dawn Henderson
Web2.0 And Business Schools Dawn HendersonDawn Henderson
 
Manuel Logiciel Techlog 2012
Manuel Logiciel Techlog 2012Manuel Logiciel Techlog 2012
Manuel Logiciel Techlog 2012BRIKAT Abdelghani
 
Sparks of Artificial General Intelligence.pdf
Sparks of Artificial General Intelligence.pdfSparks of Artificial General Intelligence.pdf
Sparks of Artificial General Intelligence.pdfNedyalkoKarabadzhako
 
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKSTHE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKSDebashish Mandal
 

Similar to Social Media Data Mining (20)

Bwl red book
Bwl red bookBwl red book
Bwl red book
 
Red book Blueworks Live
Red book Blueworks LiveRed book Blueworks Live
Red book Blueworks Live
 
sg248293
sg248293sg248293
sg248293
 
affTA00 - 10 Daftar Isi
affTA00 - 10 Daftar IsiaffTA00 - 10 Daftar Isi
affTA00 - 10 Daftar Isi
 
Jobeet 1.4-doctrine-en
Jobeet 1.4-doctrine-enJobeet 1.4-doctrine-en
Jobeet 1.4-doctrine-en
 
Ibm tivoli usage accounting manager v7.1 handbook sg247404
Ibm tivoli usage accounting manager v7.1 handbook sg247404Ibm tivoli usage accounting manager v7.1 handbook sg247404
Ibm tivoli usage accounting manager v7.1 handbook sg247404
 
Introduction to BIRT
Introduction to BIRTIntroduction to BIRT
Introduction to BIRT
 
Capturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender SystemsCapturing Knowledge Of User Preferences With Recommender Systems
Capturing Knowledge Of User Preferences With Recommender Systems
 
Web2.0 And Business Schools Dawn Henderson
Web2.0 And Business Schools   Dawn HendersonWeb2.0 And Business Schools   Dawn Henderson
Web2.0 And Business Schools Dawn Henderson
 
Graduation Report
Graduation ReportGraduation Report
Graduation Report
 
dissertation
dissertationdissertation
dissertation
 
Manuel Logiciel Techlog 2012
Manuel Logiciel Techlog 2012Manuel Logiciel Techlog 2012
Manuel Logiciel Techlog 2012
 
Operations research
Operations researchOperations research
Operations research
 
Sappress treasury and_risk
Sappress treasury and_riskSappress treasury and_risk
Sappress treasury and_risk
 
Sparks of Artificial General Intelligence.pdf
Sparks of Artificial General Intelligence.pdfSparks of Artificial General Intelligence.pdf
Sparks of Artificial General Intelligence.pdf
 
test6
test6test6
test6
 
CASE Network Report 41 - Currency Crises in Emerging Markets - Selected Compa...
CASE Network Report 41 - Currency Crises in Emerging Markets - Selected Compa...CASE Network Report 41 - Currency Crises in Emerging Markets - Selected Compa...
CASE Network Report 41 - Currency Crises in Emerging Markets - Selected Compa...
 
It project development fundamentals
It project development fundamentalsIt project development fundamentals
It project development fundamentals
 
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKSTHE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
THE IMPACT OF SOCIALMEDIA ON ENTREPRENEURIAL NETWORKS
 
Thesis writing
Thesis writingThesis writing
Thesis writing
 

More from Teresa Rothaar

Decision-Making Analysis: Chord Buddy
Decision-Making Analysis: Chord BuddyDecision-Making Analysis: Chord Buddy
Decision-Making Analysis: Chord BuddyTeresa Rothaar
 
Analysis of an Information System: NamUs
Analysis of an Information System: NamUsAnalysis of an Information System: NamUs
Analysis of an Information System: NamUsTeresa Rothaar
 
Data Backup & Disaster Planning
Data Backup & Disaster PlanningData Backup & Disaster Planning
Data Backup & Disaster PlanningTeresa Rothaar
 
Comparison of the Waterfall, Spiral, and Prototype SDLC Models
Comparison of the Waterfall, Spiral, and Prototype SDLC ModelsComparison of the Waterfall, Spiral, and Prototype SDLC Models
Comparison of the Waterfall, Spiral, and Prototype SDLC ModelsTeresa Rothaar
 
IPv6: The New Internet Protocol
IPv6: The New Internet ProtocolIPv6: The New Internet Protocol
IPv6: The New Internet ProtocolTeresa Rothaar
 
The Automobile: Genesis to Revelation
The Automobile: Genesis to RevelationThe Automobile: Genesis to Revelation
The Automobile: Genesis to RevelationTeresa Rothaar
 
4 Models of Change for IT Environments
4 Models of Change for IT Environments4 Models of Change for IT Environments
4 Models of Change for IT EnvironmentsTeresa Rothaar
 
Avon's Catastrophic Promise Project
Avon's Catastrophic Promise ProjectAvon's Catastrophic Promise Project
Avon's Catastrophic Promise ProjectTeresa Rothaar
 
The Politics of Organizational Leadership
The Politics of Organizational LeadershipThe Politics of Organizational Leadership
The Politics of Organizational LeadershipTeresa Rothaar
 
Short Opinion Paper on Ethics vs. Legality
Short Opinion Paper on Ethics vs. LegalityShort Opinion Paper on Ethics vs. Legality
Short Opinion Paper on Ethics vs. LegalityTeresa Rothaar
 
Strategic IT Plan for PetSmart PetPerks
Strategic IT Plan for PetSmart PetPerksStrategic IT Plan for PetSmart PetPerks
Strategic IT Plan for PetSmart PetPerksTeresa Rothaar
 
Fictional Business Case for Car Dealership CRM
Fictional Business Case for Car Dealership CRMFictional Business Case for Car Dealership CRM
Fictional Business Case for Car Dealership CRMTeresa Rothaar
 
Oracle vs. MS SQL Server
Oracle vs. MS SQL ServerOracle vs. MS SQL Server
Oracle vs. MS SQL ServerTeresa Rothaar
 
Case Study: Google 2012
Case Study: Google 2012Case Study: Google 2012
Case Study: Google 2012Teresa Rothaar
 
Porter Five Forces Analysis of Whole Foods Market
Porter Five Forces Analysis of Whole Foods MarketPorter Five Forces Analysis of Whole Foods Market
Porter Five Forces Analysis of Whole Foods MarketTeresa Rothaar
 
PetSmart Financial Analysis
PetSmart Financial AnalysisPetSmart Financial Analysis
PetSmart Financial AnalysisTeresa Rothaar
 
Applications of Knot Theory to DNA (Document)
Applications of Knot Theory to DNA (Document)Applications of Knot Theory to DNA (Document)
Applications of Knot Theory to DNA (Document)Teresa Rothaar
 

More from Teresa Rothaar (20)

Decision-Making Analysis: Chord Buddy
Decision-Making Analysis: Chord BuddyDecision-Making Analysis: Chord Buddy
Decision-Making Analysis: Chord Buddy
 
Analysis of an Information System: NamUs
Analysis of an Information System: NamUsAnalysis of an Information System: NamUs
Analysis of an Information System: NamUs
 
Data Backup & Disaster Planning
Data Backup & Disaster PlanningData Backup & Disaster Planning
Data Backup & Disaster Planning
 
Comparison of the Waterfall, Spiral, and Prototype SDLC Models
Comparison of the Waterfall, Spiral, and Prototype SDLC ModelsComparison of the Waterfall, Spiral, and Prototype SDLC Models
Comparison of the Waterfall, Spiral, and Prototype SDLC Models
 
IPv6: The New Internet Protocol
IPv6: The New Internet ProtocolIPv6: The New Internet Protocol
IPv6: The New Internet Protocol
 
Net Neutrality
Net NeutralityNet Neutrality
Net Neutrality
 
The Automobile: Genesis to Revelation
The Automobile: Genesis to RevelationThe Automobile: Genesis to Revelation
The Automobile: Genesis to Revelation
 
4 Models of Change for IT Environments
4 Models of Change for IT Environments4 Models of Change for IT Environments
4 Models of Change for IT Environments
 
Why Projects Fail
Why Projects FailWhy Projects Fail
Why Projects Fail
 
Avon's Catastrophic Promise Project
Avon's Catastrophic Promise ProjectAvon's Catastrophic Promise Project
Avon's Catastrophic Promise Project
 
The Politics of Organizational Leadership
The Politics of Organizational LeadershipThe Politics of Organizational Leadership
The Politics of Organizational Leadership
 
Short Opinion Paper on Ethics vs. Legality
Short Opinion Paper on Ethics vs. LegalityShort Opinion Paper on Ethics vs. Legality
Short Opinion Paper on Ethics vs. Legality
 
Strategic IT Plan for PetSmart PetPerks
Strategic IT Plan for PetSmart PetPerksStrategic IT Plan for PetSmart PetPerks
Strategic IT Plan for PetSmart PetPerks
 
Fictional Business Case for Car Dealership CRM
Fictional Business Case for Car Dealership CRMFictional Business Case for Car Dealership CRM
Fictional Business Case for Car Dealership CRM
 
Oracle vs. MS SQL Server
Oracle vs. MS SQL ServerOracle vs. MS SQL Server
Oracle vs. MS SQL Server
 
Case Study: Google 2012
Case Study: Google 2012Case Study: Google 2012
Case Study: Google 2012
 
Porter Five Forces Analysis of Whole Foods Market
Porter Five Forces Analysis of Whole Foods MarketPorter Five Forces Analysis of Whole Foods Market
Porter Five Forces Analysis of Whole Foods Market
 
PetSmart Financial Analysis
PetSmart Financial AnalysisPetSmart Financial Analysis
PetSmart Financial Analysis
 
The Dodd-Frank Act
The Dodd-Frank ActThe Dodd-Frank Act
The Dodd-Frank Act
 
Applications of Knot Theory to DNA (Document)
Applications of Knot Theory to DNA (Document)Applications of Knot Theory to DNA (Document)
Applications of Knot Theory to DNA (Document)
 

Recently uploaded

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 

Recently uploaded (20)

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 

Social Media Data Mining

  • 1. Running Head: SOCIAL MEDIA DATA MINING 1 Social Media Data Mining Action Research IST 8101 Teresa Rothaar
  • 2. SOCIAL MEDIA DATA MINING 2 Table of Contents List of Tables & Figures............................................................................................................... 5 Introduction................................................................................................................................... 6 Methodology .................................................................................................................................. 7 Literature Review ....................................................................................................................... 10 Proposal........................................................................................................................................ 13 First Iteration – Plan................................................................................................................... 15 Task No. 1: Meeting with Dr. Scanlon .................................................................................. 15 Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 16 Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining ................................................................................................................................................... 16 Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 17 First Iteration – Action............................................................................................................... 17 Task No. 1: Meeting with Dr. Scanlon .................................................................................. 17 Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 17 Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining ................................................................................................................................................... 18 Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 18 First Iteration – Observation..................................................................................................... 18 Task No. 1: Meeting with Dr. Scanlon .................................................................................. 18 Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 19 Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining ................................................................................................................................................... 22 Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 24 First Iteration – Reflection......................................................................................................... 25 Task No. 1: Meeting with Dr. Scanlon .................................................................................. 25 Task No. 2: Research Market Demand/Need for Social Media Data Mining ................... 25 Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining ................................................................................................................................................... 26 Task No. 4: Review & Organize Research Results from Tasks 2 & 3 ............................... 26 Second Iteration – Plan .............................................................................................................. 27 Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 27 Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 27 Task No. 3: Post Query to Relevant LinkedIn Groups and on Quora............................... 28 Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 28 Task No. 5: Review Responses & Take Notes ...................................................................... 28 Second Iteration – Action........................................................................................................... 29 Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 29 Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 29 Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora ........................ 29 Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 30
  • 3. SOCIAL MEDIA DATA MINING 3 Task No. 5: Review Responses & Take Notes ...................................................................... 30 Second Iteration – Observation................................................................................................. 31 Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 31 Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 32 Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora ........................ 33 Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 34 Task No. 5: Review Responses & Take Notes ...................................................................... 35 Second Iteration – Reflection..................................................................................................... 44 Task No. 1: Compose Query Letter/Message Board Query Post ....................................... 44 Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts .................................. 44 Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora ........................ 45 Task No. 4: Answer Respondents and Promote Query on Social Media as Needed ........ 46 Task No. 5: Review Responses & Take Notes ...................................................................... 46 Third Iteration – Plan................................................................................................................. 47 Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 47 Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 48 Task No. 3: Install Python & Any Other Required Software on my Computer............... 48 Task No. 4: Work Through the Selected Tutorial ............................................................... 48 Third Iteration – Action............................................................................................................. 49 Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 49 Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 49 Task No. 3: Install Python & Any Other Required Software on my Computer............... 49 Task No. 4: Work Through the Selected Tutorial ............................................................... 50 Third Iteration – Observation ................................................................................................... 50 Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 50 Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 53 Task No. 3: Install Python & Any Other Required Software on my Computer............... 54 Task No. 4: Work Through the Selected Tutorial ............................................................... 56 Third Iteration – Reflection....................................................................................................... 59 Task No. 1: Perform Research on Skill Sets Needed for Social Media Data Mining ....... 59 Task No. 2: Review Materials & Select a Tutorial to Work Through ............................... 60 Task No. 3: Install Python & Any Other Required Software on my Computer............... 60 Task No. 4: Work Through the Selected Tutorial ............................................................... 61 Fourth Iteration – Plan............................................................................................................... 62 Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots ................................................................................................................................................... 62 Task No. 2: Make a List of Things I Need to Learn ............................................................ 62 Task No. 3: Construct a Plan for Future Study ................................................................... 63 Fourth Iteration – Action........................................................................................................... 63 Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots ................................................................................................................................................... 63 Task No. 2: Make a List of Things I Need to Learn ............................................................ 64 Task No. 3: Construct a Plan for Future Study ................................................................... 64
  • 4. SOCIAL MEDIA DATA MINING 4 Fourth Iteration – Observation ................................................................................................. 64 Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots ................................................................................................................................................... 64 Task No. 2: Make a List of Things I Need to Learn ............................................................ 66 Task No. 3: Construct a Plan for Future Study ................................................................... 67 Fourth Iteration – Reflection..................................................................................................... 73 Task No. 1: Review Everything I Have Done So Far and Pinpoint Specific Trouble Spots ................................................................................................................................................... 73 Task No. 2: Make a List of Things I Need to Learn ............................................................ 73 Task No. 3: Construct a Plan for Future Study ................................................................... 74 Final Reflective Statement.......................................................................................................... 75 References.................................................................................................................................... 78
  • 5. SOCIAL MEDIA DATA MINING 5 List of Tables & Figures Figure 1. Flow chart illustrating the four iterations for my action research project..................... 15 Figure 2. Map of big data job volume by Metropolitan Statistical Area (MSA) using data from WANTED Analytics. Reprinted from “Where Big Data Jobs Will Be In 2015” in Forbes, by L. Columbus, 2014. Retrieved on June 15, 2015, from http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-jobs-will-be-in- 2015. Copyright 2014 by Louis Columbus. Reprinted with permission. ............................. 24 Figure 3. Screenshot of example 1 in the Mining the Social Web chapter 1 tutorial. I have blacked out my Twitter credentials because they need to be protected in the same manner as passwords. The plain text at the bottom is an indication that the code worked and access to the API was granted by Twitter. ........................................................................................... 57 Figure 4. Screenshot of example 2 in the Mining the Social Web chapter 1 tutorial. This code snippet asks Twitter to retrieve topics that are trending in the U.S. and worldwide and print them. The text below is truncated for visibility purposes because it is quite long. .............. 57 Figure 5. Screenshot of example 3 in the Mining the Social Web chapter 1 tutorial. This code snippet uses JSON to display the results collected in example 2 in a more readable format; again, the text displayed below is truncated for visibility. ................................................... 58 Figure 6. Screenshot of example 12 in the Mining the Social Web chapter 1 tutorial. This code snippet uses the matplotlib.pyplot Python package to plot a word frequency graph for selected Tweets. The graph displayed, but I could not figure out why the error message above it displayed.................................................................................................................. 58 Figure 7. Summary of data science training options and the benefits and drawbacks of each. Reprinted from “How do I become a data scientist?” by M. Stringer, A. Wolf, I. Sirer, D. Malmgren, and L. Skelly, 2014. Retrieved on August 15, 2015, from http://datascopeanalytics.com/blog/how-do-i-become-a-data-scientist-an-evaluation-of-3- alternatives. Copyright 2014 by Mike Stringer, Aaron Wolf, Irmak Sirer, Dean Malmgren, and Laurie Skelly. Reprinted with permission...................................................................... 69
  • 6. SOCIAL MEDIA DATA MINING 6 Introduction In a season one episode of the CBS television series Person of Interest, the two main characters had the following exchange while discussing an individual who had no social media footprint: REESE I've never understood why people put all their information on those sites. Used to make our job a lot easier in the C.I.A. (takes another sip from his cup) FINCH Of course, that's why I created them. REESE You're telling me you invented online social networking, Finch? Reese stands up. Finch goes to his computer, setting down his doughnut. FINCH The Machine needed more information. People's social graph, their associations. (opens up search on social networking website for "Jordan Hester") The government had been trying to figure it out for years. Turns out most people were happy to volunteer it. Business wound up being quite profitable, too (Berg & Beeson, 2012). People are, indeed, volunteering their information on social media sites, which has resulted in a never-ending stream of timely, easily accessible market research information for organizations (Thiel, Kötter, Berthold, Silipo, & Winters, 2012, p. 3). However, the unfathomable amount of information that is available presents its own problem: how to “[access] that data and [transform] it into something that is usable and actionable” (Thiel et al., p. 3). It is
  • 7. SOCIAL MEDIA DATA MINING 7 critically important for organizations to make use of this data; the McKinsey Global Institute reported that retailers who make use of data analysis could increase their profits by as much as 60% (as cited in Greenspan, n.d.). Knowing this, organizations are clamoring for data analysts and data scientists, but the supply of talent is so short that some organizations have had to turn to unconventional recruiting methods, such as crowdsourcing on analytics competition sites such as Kaggle (Marr, 2015b). My action research project will study social media data mining, in particular mining text data on Twitter. By completing this project, I will build knowledge about social web data mining and the job market for applicants with social web mining skills, including which programming languages and technologies are involved and how to obtain these skills. My ultimate goal is to either get a job or offer social media data mining and analysis services to private-sector organizations as a consultant. Methodology According to Baskerville and Myers (2004), “Action research aims to solve current practical problems while expanding scientific knowledge” (p. 329). Action research traces its origins to the social sciences post-WWII. It was developed by Kurt Lewin in 1947 at the University of Michigan, with the purpose of studying “social psychology within the framework of field theory” (Baskerville & Myers, 2004, p. 330). Specifically, Lewin used action research to study the psychological effects of war and incarceration in POW camps on returning WWII vets: what we would today call post-traumatic stress disorder, or PTSD (Baskerville & Wood-Harper, 1996, p. 236). At the time, almost no research had been done on this disorder; in fact, the diagnosis of PTSD was not added to the Diagnostic and Statistical Manual of Mental Disorders (the DSM, the “psychiatrists’ bible”) until 1980 (Friedman, n.d.). Researchers, perplexed by the
  • 8. SOCIAL MEDIA DATA MINING 8 widely varying symptoms the veterans under study were displaying, decided to try a different approach (Baskerville & Wood-Harper, 1996): Hence, the idea of social action arose. Scientists intervened in each experimental case by changing some aspect of the patients’ being or surroundings. Since scientist and therapist were one, the scientists were participants in their own research. The effects of the actions were recorded and studied. In this manner, a body of knowledge was developed about successful therapy for the illnesses. (p. 236-237). Huang (2010) noted that the term “action research” is often confusing to beginners because it is not a specific methodology (like the Scrum programming methodology, for example) but “an umbrella term that represents a ‘family’ of practices” (p. 94) that are focused on the researcher being an active participant in the research being conducted, as opposed to a non-participatory, neutral observer: Action research is an orientation to knowledge creation that arises in a context of practice and requires researchers to work with practitioners. Unlike conventional social science, its purpose is not primarily or solely to understand social arrangements, but also to effect desired change as a path to generating knowledge and empowering stakeholders. We may therefore say that action research represents a transformative orientation to knowledge creation in that action researchers seek to take knowledge production beyond the gatekeeping of professional knowledge makers. (p. 93) While there are numerous action research methodologies, the key similarity among all of them is that researchers do not simply sit back, observe the results of an experiment (without interference, and perhaps without ever interacting with the subjects), and record them—as is done, for example, in a drug trial—but actively participates in the research with the goal of
  • 9. SOCIAL MEDIA DATA MINING 9 bringing about some sort of change for themselves and/or their organizations. If there is no active participation, there is no “action research.” Baskerville and Wood-Harper (1996) stated that the five phases of an action research project are: 1. Diagnosing. 2. Action planning. 3. Action taking. 4. Evaluating. 5. Specifying learning (p. 237). Similar to agile programming methodologies, these phases are iterative; the researcher cycles through them throughout the course of the research study (Baskerville & Wood-Harper, 1996, p. 237). Although the concept of action research is rooted in the social sciences post-WWII, action research is quite suitable to the information technology/information systems field in the 21st century. Baskerville and Wood-Harper (1996) argue that because information technology “is a highly applied field” and “almost vocational in nature,” the action research methodology works well because it is “highly clinical in nature” (p. 235), “[places] IS researchers in a ‘helping-role’ within the organizations that are being studied” (p. 235), and “merges research with praxis” (p. 235). Action research is a suitable methodology for my project because I wish to, as Baskerville and Wood-Harper stated, “merge research with praxis” (p. 235). My goal is not simply to write a research paper about a particular topic, as I did in IST 8100 last semester, but take a hands-on approach where I can use my hybrid STEM and business education background
  • 10. SOCIAL MEDIA DATA MINING 10 and my work experience in marketing to learn about the social media data mining field and how I can get started in it. My research will be of interest to anyone who is interested in entering this field, either as a consultant or an employee of an organization. Literature Review Gundecha and Liu (2012) define data mining as “a process of discovering useful or actionable knowledge in large-scale data” (p. 2) that “is an integral part of many related fields including statistics, machine learning, pattern recognition, database systems, visualization, data warehouse, and information retrieval” (p. 2). Social media data mining is a new subfield within the broader category of data mining (p. 2). Social media is “an exceptionally rich resource that allows [big data] researchers … to study and understand human behavior and activities in unprecedented ways” (Liu, 2014, para 2). Social media network users have unfettered access to “readily available never-ending uncensored information” (Adedoyin-Olowe, Gaber, & Stahl, 2013, p. 3). However, social media data is unstructured, dynamic, and filled with “noise,” making it useless in its raw form (Gundecha & Liu, 2012; Thiel et al., 2012; Liu, 2014; Adedoyin-Olowe et al., 2013). Spam is also a problem; studies by Yardi et al. and Chu et al. (as cited in Gundecha & Liu, 2012) found that “spammers generate more data than legitimate users” (p. 4). Liu (2014) pointed out two challenges in particular: (1) having too much data about people we do not need more information about, such as celebrities and other famous people, and not enough data about people we do want to know more about, specifically, the average individual who is a potential customer for a business; and (2) because of the newness of social media and the aforementioned lack of structure, difficulties with empirical observation of the data that is available (para. 7).
  • 11. SOCIAL MEDIA DATA MINING 11 According to De Lacvivier (2013), Twitter, in particular, lends well to data mining from a developer’s perspective for the following reasons:  Twitters's API is well designed and easy to access.  Twitter data is in a convenient format for analysis.  Twitter's terms of use for the data are relatively liberal. It is generally accepted that tweets are public and accessible to anyone, hence the asymmetric following model that allows access to any account without request for approval. (De Lacvivier, 2013, “The ultimate data mining platform”) Additionally, according to Russell (cited by De Lacvivier, 2013), mining Twitter “doesn’t require advanced developer or data scientist skills,” a notion that Russell feels causes many developers to shy away from data mining (para. 1). Sentiment analysis is the process of determining whether a snippet of text is conveying positive or negative emotions (Bifet & Frank, 2010, p. 4). This technique “depends on an appropriate subjectivity lexicon that understands the relative positive, neutral or negative context of a word or expression” (Thiel et al., 2014, p. 4). These lexicons are “both language and context specific” (Thiel et al., 2014, p. 4). When a lexicon is built, it is used to train an automated sentiment classifier (Bifet & Frank, 2010, p. 4). However, building a lexicon for sentiment analysis depends on the existence of quality training and test datasets, and in social media mining, there usually are no training or test data sets (Liu, 2014, para 9). Bifet and Frank (2010) noted that Twitter sentiment analysis is not a simple task: “a tweet can contain a significant amount of information in very compressed form, and simultaneously carry positive and negative feelings” (p. 4) and “some tweets may contain sarcasm or irony” (p. 4). However, Twitter users often provide clues regarding what sentiment they are conveying,
  • 12. SOCIAL MEDIA DATA MINING 12 such as including smileys and emoticons in their tweets (p. 4). These characters can be added to the lexicon to improve the learning process of a sentiment classifier (p. 4). However, the results of a study performed by Kouloumpis et al. (2011) questioned the value of adding emoticons to a lexicon (p. 541). The conclusions were that including features that are specific to microblogging, such as hashtags, emoticons and abbreviations, to a sentiment classifier lexicon yielded better results than not including this data (p. 541). However, when hashtags were included, the value of using emoticons diminished, suggesting that these features may not be complementary from a data mining perspective (p. 541). The findings indicated that part-of-speech analysis techniques that work well on more structured data “may not be useful for sentiment analysis in the microblogging domain” (p. 541). Despite these challenges, however, “Sentiment analysis using text mining can be very powerful and is a well-established, stand-alone predictive analytic technique” (Thiel et al., 2014, p. 4). Asur and Huberman (2010) used Twitter text mining and sentiment analysis to predict movie box office revenues. Their findings illustrated that, prior to a movie’s release, “the rate at which movie tweets are generated can be used to build a powerful model for predicting movie box-office revenue” (p. 492). The predictions made by Asur and Huberman were more accurate than those of “the Hollywood Stock Exchange, the gold standard in the industry” (p. 492). Further, the researchers found, after a movie was released, sentiment analysis could be used to further hone their initial predictions (p. 493). Thiel et al. (2014) performed a research study combining predictive sentiment analysis using text mining with network analysis, which focuses not on text, but on the relationships between individuals within social media networks; in other words, who follows whom. By combining these two techniques, the researchers were able to “position negative and positive
  • 13. SOCIAL MEDIA DATA MINING 13 users in context with their relative weight as influencers or followers” (p. 17-18). The study, which used publicly available data from Slashdot and the KMIME data analytics program, found that “participants who are very negative in their sentiment are actually not highly regarded as thought leaders by the rest of the community,” a result that “goes against the popular marketing adage that negative users have a very high effect on the community at large” (p. 3). The researchers stated that they felt this unexpected insight into consumer behavior could not have been discovered using either predictive analysis or network analysis alone (p. 5). Social media data mining is a new field filled with nearly unlimited opportunity for market researchers, but also many challenges. The goals of social media data mining and analysis are to separate relevant information from noise and transform the relevant data “into something that is usable and actionable” (Thiel et al., 2012, p. 3). Researchers are attempting to adapt traditional data mining techniques for use in social media, with varying results (Kouloumpis et al., 2011). Because of its ease of use, data format, terms of use, and sheer amount of daily content generation, Twitter is very popular among data researchers, and its full potential has not yet been tapped (De Lacvivier, 2013). Much research is being done on sentiment analysis and network analysis, and at least one study suggests that combining the two methods might yield the most useful results (Thiel et al., 2012). Proposal My project deals with learning how to mine the social web for market research data. As I mentioned in a previous section, it is my goal to, as Baskerville and Wood-Harper (1996) put it, “merge research with praxis” (p. 235) so that I possess new skills upon its completion. My preliminary research found that “social media mining” is a brand-new and very wide field; it would be impossible for me to thoroughly investigate the entire scope over the next 12 weeks.
  • 14. SOCIAL MEDIA DATA MINING 14 Therefore, I am going to focus my project on learning about the job market for social media data mining, what types of jobs are available in this field at the entry level, and which specific skills (especially technological skills) are required to obtain entry-level work in this field, either as an employee or a consultant. If possible, I would like to build a couple of “toy” Twitter mining programs that I could use as the beginning of a portfolio. My external stakeholders are anyone who is interested in entering the social media data mining field. In Iteration 1, I will support what I want to do. I will use Google to perform Internet research the market need for social media data mining and research what types of jobs are available in the field. In Iteration 2, I will use LinkedIn to reach out to hiring managers, recruiters, data analysts, and other appropriate professionals and speak with them regarding the job market for applicants with social media data mining skills. I will question them about what types of jobs are available and which skills are needed. In Iteration 3, I will work through some tutorials for the purpose of determining the specific skill sets needed to perform social media data mining, so that I can find out which skills I already have and which skills I need to obtain. In Iteration 4, I will review what I did in the previous three iterations, examine what went well and what did not, find out how much more I need to learn, and construct a specific plan of future study. A flow chart of my iterations is presented in Figure 1 below:
  • 15. SOCIAL MEDIA DATA MINING 15 Figure 1. Flow chart illustrating the four iterations for my action research project. First Iteration – Plan In Iteration 1, I must lay the foundation for the remainder of my project and build support for what I want to do. The scheduled tasks in this iteration are as follows: Task No. 1: Meeting with Dr. Scanlon Because I did not do well on the last paper I submitted, I have scheduled a face-to-face meeting with Dr. Scanlon to determine how I can get my project back on track. I have set aside one hour for this meeting. The only other person involved will be Dr. Scanlon. My goal is to answer the following questions:
  • 16. SOCIAL MEDIA DATA MINING 16  How can I modify my project to meet the requirements of the course?  Once my project is modified, how do I move forward with my next paper and the remainder of the course? Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining In this task, I will collect primary market research data on the market demand/need for social media data from the point of view of a company or consultant who wishes to offer these services. The expected duration is two hours a day for three days, or six hours total. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is to answer the following questions:  Why mine the social web? What is social media data mining good for?  Who is interested in social media data? Which types of organizations? Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining This is similar to Task No. 2, but instead of collecting data on the market demand for social media data mining, I will collect preliminary data on the job market for applicants with social media data mining skills (more detailed data will be collected during the interviews I will conduct during the next iteration). To collect my data, I will do a Google search on the terms “social media data mining” and “data mining” paired with the phrases “jobs,” “careers,” and “career outlook.” The expected duration is two hours a day for three days, or six hours. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is to answer the following questions:  What types of jobs exist for applicants who have skills related to mining the social web?
  • 17. SOCIAL MEDIA DATA MINING 17  Are these jobs located in specific geographic areas? Specific industries?  What specific skill sets might an applicant need? Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3 In this task, I will review and organize the data I collected during Tasks 2 and 3. The expected duration is two hours a day for three days, or six hours. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goals during this stage will be to narrow my research focus, clarify my topic, and organize all of my information, eliminating sources and/or returning to Tasks 1 and 2 to gather more as needed. First Iteration – Action Task No. 1: Meeting with Dr. Scanlon My meeting with Dr. Scanlon lasted approximately 40 minutes. During the meeting, we clarified my goals for my project, which are to build a portfolio and either obtain a job or work as a consultant in the social data mining field. We agreed that social media mining was a good choice for a topic, and that I simply had to modify my project to be action research oriented instead of just a research paper. Dr. Scanlon assisted me with coming up with new iterations to replace my existing ones, which were not action research-oriented. He clarified the definition of “stakeholders” as relates to my project and assisted me with determining my project’s stakeholders. Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining I used Google to perform searches on “social media data mining,” “data mining,” and “big data.” I estimated that this task would take approximately six hours over a three-day period, but instead I spent about eight hours on this task.
  • 18. SOCIAL MEDIA DATA MINING 18 Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining I used Google to perform searches on the terms “social media data mining” and “data mining” paired with the phrases “jobs,” “careers,” and “career outlook.” The only search engine I used was Google, and I followed 22 links. My initial estimate was that this task would take approximately six hours over a three-day period, and the actual duration was about two hours over a two-day period. Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3 I anticipated that this task would take six hours over three days, but it took me only about four hours over two days. The work consisted of reading each of the 22 articles I bookmarked during my research and discarding the material that did not apply to this project, or, alternatively, saving it if I felt it might apply to a future iteration. First Iteration – Observation Task No. 1: Meeting with Dr. Scanlon I set aside an hour for the meeting, but it ended up taking only 40 minutes because I was not as far behind as I had thought. I also came to the meeting prepared, with my materials and questions, so that no time was wasted. Dr. Scanlon helped me redefine my iterations. The result was that I left the meeting with all of the information I needed to proceed with my project and write my first iteration paper, and a clearer understanding of what is expected of me during the course of this project. The new iterations are as follows: 1. Support what I want to do – research the market need for social media data mining, and research what types of jobs are available in the field.
  • 19. SOCIAL MEDIA DATA MINING 19 2. Locate and interview hiring managers, recruiters, data analysts, and other appropriate professionals regarding the job market for applicants with social media data mining skills; question them about what types of jobs are available and which skills are needed. 3. Examine the skill sets needed to perform social media data mining; focus on which specific skills are required. 4. Discuss my results. What happened? How much more do I need to learn? I can also perform some portfolio-building during this stage by coding test programs to mine sample Twitter data. We also defined my stakeholders (other than me): anyone who is interested in a career in social media data mining, whether as a consultant or an employee. We went over my literature review, and I discovered that my problem was that I had simply quoted the various pieces of literature without analyzing them and finding commonalities. Dr. Scanlon agreed with my statement that I may have used too many sources, and that I would have been better off using fewer sources but providing a more in-depth explanation and analysis of each, along with pointing out their commonalities. Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining I estimated that this task would take approximately six hours, but it took about eight, primarily due to finding results that, while not applicable to this project or this particular phase of the project, I was interested in and bookmarked for further reading; for example, I found many threads on Quora.com about how to become a data analyst/data scientist and the skills that are needed to enter the field, as well as a number of tutorials on the Python and R programming
  • 20. SOCIAL MEDIA DATA MINING 20 languages. My research revealed that there is much demand for social media data mining and analysis among organizations in all industries, as documented below. The Google search results showed that social media data mining is “an emerging discipline under the umbrella of data mining” (Zafarani, Abbasi, & Liu, 2014, p. 16). It has a myriad of applications in the public and private sector, as Betancourt (2010) elaborated: Entities such as airlines, politicians, and even non-profits can use this data for finding new customers or targeting products to existing ones. Financial services companies such as banks and lenders are also using the same data mining services for marketing purposes and to make lending decisions. For instance, certain types of credit products, which fit your personality, could be marketed specifically to you. (“How Data is Being Used”) The amount of content being shared on social media sites has skyrocketed in recent years: Facebook exceeded one billion active users in 2012; approximately 500 million tweets are being posted daily; and there an estimated 250 million blogs on the web (McBeath, 2013, para. 1). At first, many organizations saw social media as a vehicle to broadcast marketing messages to customers and personally connect them with the company’s brand (Boorman, 2011, para. 1). However, “many of these messages focus on brands and products” (Salampasis, Paltoglou, & Giachanou, 2013, p. 87) and every time a user shares content on a social media site, they are providing information regarding “their preferences, their demographics, their behaviors, and their relationships” (Ridge, 2014, “Social Data has Scope”). Best of all, the majority of this data is public and “free for the taking” (McBeath, 2013, “Mining for Sentiment”). Realizing the marketing potential, many companies are now mining this data and using it analyze how customers feel about their products and services, a process known as “sentiment analysis” (McBeath, 2013, “Mining for Sentiment”). Organizations are also using social media
  • 21. SOCIAL MEDIA DATA MINING 21 data to “build dossiers” on current and potential customers (Betancourt, 2010, para. 1) that can be used to send highly targeted marketing messages (Betancourt, 2010, “How Data is Being Used”). Other potential uses of social media mining include:  Recruiting and vetting job candidates: According to a survey by Jobvite (2014), “73% of recruiters have hired a candidate through social media” (p. 9) and 93% “will review a candidate’s social profile before making a hiring decision” (p. 10).  Preventing and fighting crime: Law enforcement is mining social media data to “detect and counter criminal activities” (McBeath, 2013, “Beyond Sentiment”). Sometimes, content posted by perpetrators, such as video of a suspect bragging about a crime, is later used in court (McBeath, 2013, “Beyond Sentiment”). Federal law enforcement mines the social web to track terrorist chatter and other threats to national security (McBeath, 2013, “Beyond Sentiment”).  Epidemiology and public health: Social data is being used “to help identify, track, and predict disease outbreaks” (McBeath, 2013, “Beyond Sentiment”).  Emergency and disaster response: Oak Ridge National Laboratory is researching how social media data can be used to better assess the nature and scope of disasters, and thus better respond to them (McBeath, 2013, “Beyond Sentiment”). Supply chain risk managers are mining the same data so they can become aware of and quickly react to natural and man-made disasters that could impact their supply pipelines (McBeath, 2013, “Beyond Sentiment”). However, despite the wide interest in social media data mining, many companies still do not know how to sort through all of this information and extract actionable business intelligence (Ridge, 2014, para. 2). There are challenges associated with dealing with such voluminous data
  • 22. SOCIAL MEDIA DATA MINING 22 sets and separating irrelevant information or spam from relevant data (Boorman, 2011, “Tapping Social Media Data”). In conclusion, social media data mining is a timely topic, and there is a strong market demand for data mining and analysis services. Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining I estimated that this task would take approximately six hours, but it ended up taking me only about two hours, primarily because I stumbled across job market information while completing the previous task. I was unable to find job market information about “social media data mining,” but much information was available regarding big data and data mining in general. My research, which is documented below, indicated that the job market for applicants with data mining and analysis skills is very strong. Robert Schnabel, dean of the School of Informatics and Computing at Indiana University, told Forbes, “We see the data science job market requiring two types of professionals: those with deep technical skills, and managers and analysts with the knowledge to use the analysis of big data to make effective decisions” (as quoted in Violino, 2014, “New Data Degree Programs”). There is a critical shortage of applicants with data science skills: According to Columbus (2014), “Demand for Computer Systems Analysts with big data expertise increased 89.9%” from January through December 2014. At the same time, McKinsey Global Institute, in a May 2011 report, predicted that by 2018, the United States would face a shortfall of 140,000 to 190,000 applicants with “deep analytical skills” and 1.5 million managers (as cited in Jain, 2013). Columbus’s (2014) research, which examined job listings and job market information collected by WANTED Analytics, found the following:
  • 23. SOCIAL MEDIA DATA MINING 23  The industries with the most job openings in the big data field were “Professional, Scientific and Technical Services (27.14%), Information Technologies (18.89%), Manufacturing (12.35%), Retail Trade (9.62%) and Sustainability, Waste Management & Remediation Services (8.20%).”  The top geographical areas for jobs were the San Francisco Bay area and the Washington, D.C. area. However, data jobs can be found in major metropolitan areas throughout the country, as shown in Figure 2 below.  The skills that appeared most frequently in big data job ads were Python, Linux, and SQL, with Python seeing the most growth in demand (nearly 97%) between 2013 and 2014.  The median salary for big data professionals was found to be $103,000/year, and sample job titles included “Big Data Solution Architect, Linux Systems and Big Data Engineer, Big Data Platform Engineer, Lead Software Engineer, [and] Big Data (Java, Hadoop, SQL).”
  • 24. SOCIAL MEDIA DATA MINING 24 Figure 2. Map of big data job volume by Metropolitan Statistical Area (MSA) using data from WANTED Analytics. Reprinted from “Where Big Data Jobs Will Be In 2015” in Forbes, by L. Columbus, 2014. Retrieved on June 15, 2015, from http://www.forbes.com/sites/louiscolumbus/2014/12/29/where-big-data-jobs-will-be-in-2015. Copyright 2014 by Louis Columbus. Reprinted with permission. Ide (2014) stated that social media is one of the “hottest sectors for big data growth” (para. 1), which coincides with the market demand for social media mining and analysis that I found while performing research in Task 2. Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3 Reviewing and organizing the results of my research took only about four hours, two short of my original estimate. This was because I knew exactly what I was looking for and had previously performed some research on this topic before beginning this class; I already had a
  • 25. SOCIAL MEDIA DATA MINING 25 “data mining” folder in my web browser’s bookmarks bar. Therefore, it did not take me very long to decide to use, discard, or save for future use any particular source. First Iteration – Reflection Task No. 1: Meeting with Dr. Scanlon This meeting went very well. My only mistake was that I did not ask for help earlier, before I wrote my last paper; if I had, I would have done much better and been much further along. I did not have a firm understanding of the action research process or how to put together a project that met the requirements of action research. I also did not fully understand how to conduct a literature review. I had never written one before. I suggested to Dr. Scanlon that the IST program at Wilmington University be modified to include at least one exposure to literature reviews prior to IST 8101; IST 8100 would be a good class for this. Finally, I made many small mistakes regarding APA citation style, and I need to ensure that I format my future papers correctly. If I have any questions, concerns, or difficulties as I work through my next three iterations, I will seek help immediately. Task No. 2: ResearchMarket Demand/Need for Social Media Data Mining I felt this task went really well, even though it took me two hours longer than expected. Social media data mining is a very popular current topic, and I had no difficulty finding information on the market for this type of service. In addition to the sources used during this iteration, I found many resources that I bookmarked for later use. This reduced the time needed to work on Task 3 during this iteration, and I anticipate that it will reduce needed for research during future iterations. However, as I mentioned in my reflection for Task 1, I wish I had sought help regarding my project earlier, thus giving myself more time.
  • 26. SOCIAL MEDIA DATA MINING 26 Task No. 3: Preliminary Research on Job Market Relatedto Social Media Data Mining Similar to my experience with Task 2, I feel that this task went very well. It did not take as much time as I’d expected, largely because I came across a lot of job market information while completing Task 2. The drawback was that I was unable to find job market information about social media data mining specifically, only big data/data science/data mining in general. I discarded some of the articles because they did not apply to this project or this iteration. For example, some of the articles were about the big data job market in India, and I am focusing on the U.S.; other articles contained duplicative information; and other articles focused on big data training programs, tutorials, and degree programs as opposed to job market information. During my next iteration, I am scheduled to seek out and speak with recruiters, hiring managers, and/or data analysts/scientists, and I am hopeful that they will be able to provide me with more specific information on social web mining jobs. Task No. 4: Review & Organize ResearchResults from Tasks 2 & 3 This task also went very well, and my actual time spent was two hours short of my original estimate. I am a very organized person, and I have years of experience as a copywriter, so gathering and organizing research is one of my strengths. One thing I have learned to do specifically for academic writing is to create a text file containing the reference page citations for my sources ahead of time; this saves a lot of time when putting together the final paper. In summary, it was difficult cramming all of the research, organization, and copywriting into the seven-day period I had between my meeting with Dr. Scanlon and the [extended] due date for my first iteration paper, especially since I am taking an additional class during the Summer I block. I am glad I asked for a 48-hour extension to submit my paper, as I ended up needing it. I am emerging from this iteration with a clear understanding and outline for my
  • 27. SOCIAL MEDIA DATA MINING 27 project. I learned to ask for help as soon as I run into difficulty instead of trying to figure everything out on my own, and I also learned that I need to work on my knowledge of APA style and ensure my papers are formatted correctly; it is silly to lose points because of style issues. Second Iteration – Plan In Iteration 2, I must locate and interview hiring managers, recruiters, data analysts, and other appropriate professionals regarding the job market for applicants with social media data mining skills and question them about what types of jobs are available and which skills are needed. The scheduled tasks in this iteration are as follows: Task No. 1: Compose Query Letter/Message Board Query Post I must compose a query letter to be sent to recruiters, data analysts, and other appropriate professionals in the big data industry, along with slight variations to be posted on appropriate LinkedIn Groups and on Quora. I have set aside one hour to complete this task. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is to write a questionnaire that is short enough to not scare off any potential respondents, yet will gather the information I need. Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts In this task, I will search my LinkedIn contacts list for IT recruiters and appropriate professionals in the data analytics field, such as data analysts, and send them the query letter written during Task 1. I have set aside two hours to complete this task. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is send out as many letters as possible, as it is likely that most of them will be ignored or deleted as spam.
  • 28. SOCIAL MEDIA DATA MINING 28 Task No. 3: Post Query to Relevant LinkedIn Groups and on Quora In this task, I will post my query as a message to relevant LinkedIn groups and as a question on Quora. I have set aside one hour to complete this task. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is to post the query to LinkedIn groups that are related to data mining and other appropriate groups, such as groups for Temple and Wilmington University alumni, so that people who are able to answer the questions have a chance to see them. I will also post the query as a Quora question in an attempt to reach more respondents. Task No. 4: Answer Respondents and Promote Query on Social Media as Needed In this task, I will answer any emails I receive and acknowledge/answer any responses to my message board posts and Quora question. I have set aside eight hours over the next two weeks to complete this task. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. The people involved at this stage will be my respondents and me. My goal is to interact with my respondents, let them know that I appreciate their taking the time to answer my questions, and obtaining follow-up information as needed. Task No. 5: Review Responses & Take Notes In this task, I will read and review the responses I received to my emails and message board posts and take notes. I have set aside four hours to complete this task. To complete the task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is to compare the responses and look for similarities so that I can determine which skills I need to focus on during the next iteration of this project.
  • 29. SOCIAL MEDIA DATA MINING 29 Second Iteration – Action Task No. 1: Compose Query Letter/Message Board Query Post I set aside one hour for this task, and that is how long it took to complete. Because I have a lot of copywriting experience, I am very good at estimating how long a writing assignment will take to complete. As with any copywriting assignment, most of the time was not spent on actually writing the letter/post—it was not that long—but on considering how to approach potential respondents so that they would be encouraged to respond to me. Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts I set aside two hours for this task, and it took me about an hour and a half to complete. I have an extensive LinkedIn list with over 500 contacts. I went through the list and searched for IT recruiters and data professionals, such as data analysts and data scientists. I located approximately 100 contacts that were either IT recruiters or data professionals, and I sent my query letter to them. Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora I posted my query as a message to the following LinkedIn groups: Big Data and Analytics; LI Live - Philly; KDNuggets Analytics; Connect: Professional Women’s Network; IT Pros - Philadelphia; Python Community; Python Data Science & Machine Learning; Temple University Alumni; Temple University Young Alumni; R Project for Statistical Computing; Wilmington University; Wilmington University’s College of Technology; and WomenWorking. I chose these particular groups because they were either related to big data/data science or information technology specifically or were general professional networking groups that I knew, from previous experience, had members who were either IT recruiters or data professionals.
  • 30. SOCIAL MEDIA DATA MINING 30 After I finished posting to the LinkedIn groups, I slightly rewrote the copy for the query so that I could turn it into a Quora question. I posted the question on Quora and shared the Quora question on Facebook and Twitter, and as a personal LinkedIn update. I also cut and pasted the query into a Google Doc so I could easily share it on Facebook, Twitter, and as a LinkedIn status update, in an attempt to attract respondents who do not participate on Quora or in LinkedIn groups. I ended up slightly editing the posts the day after I originally made them. The original post explained that I needed to interview data science professionals for an action research project, but did not include the actual questions to be answered. The edit included the questions. I originally set aside one hour to complete this task, but between the original posts and the slight edits made to the posts afterward, this task took about two hours, over the course of two days, to complete. Task No. 4: Answer Respondents and Promote Query on Social Media as Needed I set aside eight hours for this task, but it ended up taking about 4.5 hours. This task primarily consisted of responding to individuals who responded to my personal query letters, LinkedIn group posts, and Quora question, thanking them for their responses, asking follow-up questions as appropriate, and, in some cases, answering their questions. I also tweeted my Quora on three separate days in an effort to attract more respondents. Task No. 5: Review Responses & Take Notes I set aside four hours to complete this task, and it ended up taking about that long to complete. The task consisted of reviewing the numerous responses I received from the LinkedIn Big Data and Analytics group, one email response, one response through LinkedIn’s messaging system, and one response to my Quora question, looking for similarities, and taking notes.
  • 31. SOCIAL MEDIA DATA MINING 31 Second Iteration – Observation Task No. 1: Compose Query Letter/Message Board Query Post I estimated that this task would take approximately one hour, and that is how long it ended up taking to complete. I ended up with two versions, one query letter that I sent out through email and LinkedIn’s message system, and one for posting to the message boards. Version 1, which I sent out through email, read as follows: Subject: Need to interview data professionals for master's research project I was wondering if you could help me out. I am finishing up my MS in MIS at Wilmington University. I am currently working on an action research project for my capstone class. My research subject is social media data mining. I need to locate hiring managers, recruiters, data analysts, and other appropriate professionals regarding the job market for applicants with social media data mining skills and interview them about what types of jobs are available and which skills are needed. The interviews can be conducted via email; in fact, that would probably be easiest. Unfortunately, I have no professional contacts in that field. Do you know anyone who could help me? Even if the individuals recruit/work in data mining in general, and not social media mining in particular, that would be fine, too. Please let me know if you could be of assistance. My email is troth90208@wildcats.wilmu.edu. Thanks so much! Teresa Rothaar
  • 32. SOCIAL MEDIA DATA MINING 32 At first, I used this version of the copy on the message boards as well. However, the day after making my original post, I edited it to include the actual questions I intended to send to respondents to my query letter, as follows (this version of the copy was also used on Quora): I am currently working on an action research project for my capstone class for my master of science in MIS. My research subject is social media data mining. I need to obtain information regarding the job market for applicants with social media data mining skills (or even just data mining skills in general). Specifically:  What are the minimum skills needed to get a job data mining social media or performing data mining/data science in general (programming languages, technologies, math knowledge, etc.)?  What education is needed?  What types of jobs are available at the entry level (job titles/brief descriptions)?  Where are the jobs: which industries? Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts I set aside two hours for this task, and it took me about an hour and a half to complete. The task took less time because while I have many LinkedIn contacts, most of them do not work in data science, information technology, or IT/data science recruiting, and it did not take me as long to go through the list as I had anticipated. I have over 500 contacts, and I sent letters to approximately 100 of them. Only two contacts responded to my query. Details of the responses will be discussed below in the Task No. 5 observation section.
  • 33. SOCIAL MEDIA DATA MINING 33 Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora I originally set aside one hour to complete this task, but between the original posts and the edits made to the posts the following day, this task took about two hours, over the course of two days, to complete. I had some difficulty with LinkedIn’s spam filters; LinkedIn was flagging my messages as being “promotional in nature” and moving them to the “promotions” tab, which is located outside of the main discussion area. I found that by writing out my email address as “troth90208 -at- wildcats.wilmu.edu” instead of “troth90208@wildcats.wilmu.edu,” LinkedIn allowed the messages to be posted in the main discussion forum. As mentioned in my description for my first task, I ended up editing the post to include the actual questions in the message body as opposed to requesting that respondents emailed me. I did this because I received three responses from individuals who seemed to think that the interview would be extensive and take up a lot of the respondents’ time. In response to my posting in the Temple University Alumni LinkedIn group, [REDACTED] sent me a personal message stating, “The people in this field are very busy. These things take time. Perhaps you could attend some conferences this summer and get to know some people who would be willing to be interviewed” (personal communication, June 23, 2015). In the R Project for Statistical Computing Group, [REDACTED] suggested that I “create an online survey using SurveyMonkey or something similar, then post the link to LinkedIn,” and [REDACTED] told me, “You can obtain the information you are looking for simply by analyzing job openings.” Even after I edited my posts to include the questions, none of my group threads attracted any responses except for the thread in the Big Data and Analytics Group, which immediately became very active. I received responses from 12 members, and I engaged in interactive, back-
  • 34. SOCIAL MEDIA DATA MINING 34 and-forth discussions with them. Additionally, two individuals who are new to the data science field, as I am, indicated that they had followed the thread because they, too, wanted to hear the answers to these questions. In addition to the LinkedIn groups, I posted my query as a question on Quora. I received only two responses; one was a junk response from an Internet troll, and the other was a serious response from an IT professional. Details of the responses will be discussed below in the Task No. 5 observation section. Task No. 4: Answer Respondents and Promote Query on Social Media as Needed This task took only 4.5 hours, even though I had set aside eight hours. Admittedly, I was not certain how long it was going to take, and in cases when I am unsure of a job’s duration, I tend to estimate high. This comes from my experience as a freelancer; I have found that clients would rather have me say that a job will take three days and I get it to them in two, as opposed to the other way around. I posted the Quora question on Facebook, Twitter, and as a LinkedIn status update. While three people shared my Facebook post, and about a dozen people re-tweeted me on Twitter, I received only one valid response to my Quora question. I suspect that the poor response rate was due to my audience. The post in the Big Data and Analytics LinkedIn group reached a large community of data professionals who are eager to share knowledge, while my social media posts largely reached people who do not work in the field. I made sure to thank every person who responded on LinkedIn for taking the time to do so. One respondent indicated that he felt he was making “too many posts” (Oldfield, 2015b), and I assured him that I was grateful that he was willing to share so much information with me. I also responded to the individuals who reacted negatively on the Temple University Alumni and R
  • 35. SOCIAL MEDIA DATA MINING 35 Project groups. I explained to them that I needed to conduct personal interviews, and not just Internet research, for an action research project, that the interviews would take no more than 10 to 15 minutes, and that the project was time-sensitive; I could not wait weeks or months for responses. My explanations did not work—I did not receive any responses in those groups—but I felt it was important to maintain a professional image. I also interacted with the participants on the Big Data and Analytics group thread, asking follow-up questions. In particular, when I received responses that contained very long lists of required programming languages and technologies, I asked if there existed any truly entry-level positions for candidates with an IT education but no work experience in the IT field. Task No. 5: Review Responses & Take Notes This task took about four hours to complete, which matched my original time estimate. I read each response thoroughly and took notes, looking for similarities. I received responses from 15 different individuals. Some of the participants on the LinkedIn thread posted to the thread two or three times. Most of the responses addressed data science jobs in general, not social media data mining specifically. Each individual’s response is summarized below. T. Kielinski, the founder and CEO of IT Pros, a Philadelphia IT staffing firm, responded to me using LinkedIn’s messaging system. In response to my question about the minimum skills needed to get an entry-level job, he simply listed a number of “free and/or open source tools for data mining applications,” specifically, RStudio, Tinn-R, Weka, RapidMiner, KNIME, the Mahout machine learning library, Rattle, CLUTO, fastcluster, arules, ARMiner, TraMineR, Gephi, Pajek, CFinder, ProM, GeoDa, and CLAVIN. He said that the required education “ranges from formal training to a master’s degree” and that I could obtain sample job titles and job descriptions by doing a search on Indeed.com and using the keywords “data mining.” According
  • 36. SOCIAL MEDIA DATA MINING 36 to T. Kielinski, the industries hiring data scientists are “Internet, staffing and recruitment, ecommerce, and health care” (personal communication, June 23, 2015). B. Benson, a data analytics professional, spoke with me via email (personal communication, June 23, 2015). He began by writing about the confusion he has encountered regarding the difference between communications personnel and actual data analysts: In my experience, there is little understanding of the difference between a digital analyst and a digital communications director. Much of the conflation occurs between the communications role of drafting text and the technology role of managing the messaging tools. Most professional digital directors within my sector, progressive political campaigns, wear both hats well. Fortunately, as the industry matures, there is better segmentation between analysis and communications, leading to more specialized roles for analysts. (personal communication, June 23, 2015) He wrote that the required education “is generally a bachelor's with some sort of focus in analysis or messaging” (personal communication, June 23, 2015), and that when he is hiring an analyst to perform data mining tasks, he looks for “strong statistical skills” and further adds: Many people with a biotech background have a lot of experience using R, SPSS, or Stata, though almost everyone ends up using R. Familiarity with relational databases is important as well, with some variation of SQL being standard. Data visualization is extremely helpful, and can be generated from most any application. Specialization in Tableau is attractive. GIS is an added bonus, because higher ups love maps with an illogical fervor. (personal communication, June 23, 2015) In Mr. Benson’s sector—politics—the most common-entry-level positions are for digital directors, digital strategists, or social media managers (personal communication, June 23, 2015).
  • 37. SOCIAL MEDIA DATA MINING 37 D. Herrera is a software developer who responded to my Quora question. She said that the required education “is relative to what you're doing. I have a BS in animal science, but I have learned and taught seven different programming languages and hold several certifications. It's the languages and understanding of the languages that matter” (Herrera, 2015). She pointed out that to mine social media, knowledge of the API of the network(s) you wish to mine is most important; the specific programming language is less important because “most languages can utilize APIs, so it's a pick your poison situation. Different people will have different recommendations” (Herrera, 2015). Because I received so many responses, I found the last statement to be entirely true, although the four primary languages in the data science world are R, Python, SAS, and SQL (Piatetsky, 2014, para. 3). D. Herrera stated that typical entry-level job titles would include DBA, solution developer, or solution analyst, but noted that the exact title would “depend more on the company you look for rather than the developer titles” (Herrera, 2015). She said that she suspected that most industries hiring social media data scientists would be in the B2C (business to consumer) category, but that B2B (business to business) companies might be interested in mining LinkedIn (Herrera, 2015). B. Mathews, an IT professional, was the first participant in the thread I created in the LinkedIn Big Data and Analytics Group. He stated that an entry-level job applicant would need to know languages and tools such as R, Pig, Java, and Hadoop, and have a strong math background, to include probability and statistics. Applicants also need a bachelor’s degree, “curiosity, some knowledge of business, [and] perhaps computer sciences” (Mathews, 2015). According to this respondent, typical entry-level jobs would include business analyst or data analyst, and typical industries would be “oil and gas, financial institutions, genome research, manufacturing, automotive, [and] government” (Mathews, 2015).
  • 38. SOCIAL MEDIA DATA MINING 38 A. Burris, a data scientist, stated that entry-level applicants should have a bachelor’s degree in business administration or finance, advanced knowledge of statistics, and knowledge of SQL, Minitab, Tableau, and “Microsoft/Apple/Google products for analysis, interpretation, [and] presentation of results” (Burris, 2015). He further stated that typical industries looking for data science applicants include “consumer goods and retail, energy and utilities, healthcare, information services, and travel and industry” and that a business analyst job would be a typical entry-level role (Burris, 2015). R. Taneja, an IT professional, stated that the most popular languages and technologies are R, SAS, Python, Pig, Hive, MATLAB, Scala, Hadoop, and NoSQL, along with communication skills, business acumen, and a curiosity about data and, similar to what B. Mathews stated, a curiosity about data and information (Taneja, 2015). She said that educational requirements were, at minimum, a bachelor’s degree in a STEM field, and that she has often seen employers require applicants to have a Ph.D. for data scientist roles (Taneja, 2015). She noted that data scientist is the most popular job title she has encountered, but that data engineer “is becoming more popular as organizations look for more wide skillsets” (Taneja, 2015). She stated that typical industries include healthcare, finance, genome research, automotive, manufacturing, and digital and mobile industries (Taneja, 2015). C. Mullins, a DBA and big data consultant, entered the conversation by noting that although newer data science technologies, such as Hadoop and Pig, are important, many organizations are still making use of older technologies: [S]everal recent surveys indicate that a lot of big data/analytics projects are using pre- existing technologies like SQL, relational databases, transaction data, and even spreadsheets. In fact, these are all at the top of the list. Now this might be because big
  • 39. SOCIAL MEDIA DATA MINING 39 data analytics projects and technologies (e.g., Hadoop, R, Pig, NoSQL, etc.) are newer and not as pervasive yet among data professionals. But I would not discount the tried and true (alongside the new). (Mullins, 2015a) Later in the thread, he noted that there are many potential jobs in the big data market: There are DBAs (who need to understand in-depth details of the operations of the DBMS software being used; there may be many different database systems in use); the data scientist (who needs to understand how to develop models to pull information from the databases available, as well as to provide guidance on what additional data may be needed to achieve the required goal); subject matter experts (who understand the business and its data at a detailed level to provide guidance to data scientists about the specifics of the business data); developers (who can write code in various languages expertly); ETL (to move data in and out of databases); and probably more that I am missing. Note, too, that there will (or should) be overlap between the skillsets of these individual workers. (Mullins, 2015b) U. Shah, a big data analyst and IT architect, concurred with C. Mullins regarding the importance of SQL, and added that “R is great for data analysis; you can even use simple SQL inside R using packages” (Shah, 2015). He stated that Python is good for writing programs. He suggested that, to get started, I should “create a free account on Amazon AWS and play with what you have learned or experiment with different datasets. Cleaning data, transforming data and visualizing data can teach you many things” (Shah, 2015). S. Shaw, an IT professional, said that he felt the other respondents quoted above described the market very well. He added that the “top talent” he works with “have skills that cross disciplines” (Shaw, 2015). He emphasized the need for a solid foundation in mathematics,
  • 40. SOCIAL MEDIA DATA MINING 40 along with technologies such as HDFS (Hadoop Distributed File System), Java, and scripting languages (Shaw, 2015). R. Del Rosario, an IT professional, emphasized the importance of defining what type of data you wish to mine from social media, be it sentiment analysis, product name mentions, or geographic location based on sentiment analysis or some search term, before diving in, so as to “get specific data and avoid data overload” (Del Rosario, 2015a). In response to the comments stressing the importance of SQL, R. Del Rosario stated that SQL “will not work with unstructured data … Most data in social media is unstructured” (Del Rosario, 2015a). Later in the thread, in response to a post where I asked about the feasibility of an applicant like me, with no experience, entering the field, he assured me: [S]trictly speaking, there are only two major groups of skill when it comes to Big Data. The first major group are those who self-learned this technology or those who contributed to the growth of Big Data, and those who have started to adapt to the Big Data technologies. So the good news is, technically, there are lots of "entry-levels" and there are lots of levels or categories where you can get in the Big Data bandwagon. (Del Rosario, 2015b) He suggested that I take online courses such as those offered by Big Data University, which he said is free, and that I would be ready for a job “with about six months of intensive learning” (Del Rosario, 2015b). C. Gilbert, an IT recruiter, discussed how social media mining is only one subset of the data science field: Social media mining should be just one part of the analysis that a data scientist/engineer /analyst does. The social media needs to be linked to something and combined with other
  • 41. SOCIAL MEDIA DATA MINING 41 data and information. It is generally accepted that 80% of the time spent on in depth data analysis (data science/data mining) is actually spent on data preparation. … For social media specifically I would expect people to have skills with natural language processing using deep learning techniques. Python has libraries for this. And so, it is not just Python skills that are needed but skills in specific types of techniques or libraries with those languages. Data matching with and without (using tools or packages) is also useful for linking your sentiment to specific customers. … The capacity to collect and process more data is continuing to lead to more analysis, and social media, in many cases, is just one aspect of analyzing an increasingly big picture. (Gilbert, 2015) He also noted that good storytelling and presentation skills are important, as the information that is mined will have to be presented and explained to other people, and that it is helpful for a job applicant to have specific knowledge of whatever industry (telecom, finance, etc.) he or she wishes to work in (Gilbert, 2015). K. Lawlor, a data analyst student, said that he concurred with most of what C. Gilbert said, particularly the importance of industry-specific knowledge: “I have seen many articles, or projects, which make findings/recommendations which are inaccurate, as they have not understood the domain in which they are working” (Lawlor, 2015). He also stated that, in his experience, many employers do not fully understand what they are looking for when they are seeking to hire a data professional: “I have seen recruiters [post] job adverts with 'data scientist' as a heading and a description of a database administrator, or something similar” (Lawlor, 2015). I had several exchanges with T. Oldfield over the course of two days. Mr. Oldfield is a software architect who recruits developers to write big data analytics software; the talent he recruits has at least two to three years of development experience and not only “knowledge of
  • 42. SOCIAL MEDIA DATA MINING 42 ‘data science’ but also the knowledge to realize that dream within computer code” (Oldfield, 2015a). He admitted that his requirements are very stringent and that “[his] expectations are difficult for most people” (Oldfield, 2015a). Specifically, the applicants he recruits must be expert coders with a deep understanding not only of coding, but operating systems concepts such as threading and processing, data storage and security, and data types (Oldfield, 2015a). He elaborated: [T]he traditional taught methods of coding are completely useless in this field. This comes down to someone that can think outside the box and fully understands how memory management works, preferably with some knowledge of L1/2/3 caching, data bandwidths on the FSB, threading, locking etc. - microcode implementation, and hardware design implications. (Oldfield, 2015a) Because he does not hire entry-level applicants, he at first expressed concern that his answers would not be relevant to my project (Oldfield, 2015a). However, he also admitted that most applicants are unable to meet his requirements exactly, and that the most important thing to him when looking to hire someone is the applicant’s mindset; specifically, whether they can think critically and ask questions: [Y]ou need a certain mind set: open mind, ask, ability to turn things on their head and ask the question differently. Hence my comments "what you were taught about computer code is wrong for this field." The fact that you asking - and keep asking what do we mean is a good start. Second - start at the bottom - what part interests you/you have trained in, start here and branch out as you gain more experience. I have also note that many of the skill I want to employ are about as likely as "pigs flying", so no worries - if the subject interests you, you have the right mind set - then you will be good. (Oldfield, 2015b)
  • 43. SOCIAL MEDIA DATA MINING 43 He explained that this is because the data science field is still wide-open, and many discoveries are yet to be made. People working in data science cannot depend on pre-defined procedures; they must create their own: Even though it has been going for years, it is still "research" - there are no pre-defined procedures to get the answer you want - you have to think of the solution to THIS problem for nearly every new problem. Not always of course - tools are improving. (Oldfield, 2015b) Further to my exchange with T. Oldfield, K. Jones stated that he has been “in the reporting, big data space & Hadoop space for many years” (Jones, 2015a). His comments elaborated on the confusion that exists in what is a relatively new industry: “There are many competing ideas and approaches in data science. Most Fortune 500 firms can't figure out a decisive direction fearing they'll fail with their investments” (Jones, 2015b). He mentioned that he feels too many Silicon Valley companies are entering the data science field without knowing what they are doing, and that the “vast majority of them will end up in the graveyard if they are not fortunate enough to be bought out by a larger, well financed firm” (Jones, 2015b). A. Gonzalez, an IT professional, was the final participant in the thread. I had expressed concern about learning to use all of the many tools the previous respondents mentioned, on my own, from home, and he responded with the following: In my opinion, I think there is a difference between what skills companies are requiring for hiring people working in Big Data and what they will probably need. Not only the companies but people have the same doubts; you can see this discussion also along the previous posts.
  • 44. SOCIAL MEDIA DATA MINING 44 For some people or companies the required skills are based on the tools or the languages, they talk about Hadoop, Spark, SQL, OpenCync, Python. In my opinion, those are just nice programming skills, but those are not the skills needed in an area where the future is so uncertain, where the technology can and will swift so fast. (Gonzalez, 2015) This was similar to T. Oldfield’s comment about how there is often a gap between the skills he wants his applicants to have and the skills his applicants actually have. Second Iteration – Reflection Task No. 1: Compose Query Letter/Message Board Query Post This task was a basic copywriting assignment; I have written many letters such as these over the years, and I did not have much trouble composing this one. Because the letter was meant to prompt an action from the receiver—in this case, getting the receiver to agree to an interview—I approached it the way I would a sales letter. This is why I began the letter by stating, “I was wondering if you could help me out.” I was taught to open sales pitches in this manner in a sales class I took as part of my MBA. The logic is that while everyone likes to buy things, no one likes to be sold something, and asking for a prospect’s help is a much better way of opening a sales letter than launching directly into your pitch. I used my Wilmington University email address in the letter, as opposed to one of my non-university addresses, so that the recipients would see that I really was a graduate student, and not someone trying to build a list of targets to spam. Task No. 2: Send Query Letter to Appropriate LinkedIn Contacts As mentioned previously, I sent out approximately 100 query letters and received only two responses. Part of the problem is that I do not personally know most of the people I am connected with on LinkedIn, and the people I do know do not work in information technology;
  • 45. SOCIAL MEDIA DATA MINING 45 most of them are attorneys or work in marketing and sales. I was, in effect, engaging in cold calling, and cold calling is a numbers game. I knew that I would have to send out as many letters as possible to get a very small number of responses. However, in hindsight, I believe my biggest mistake was to not include the actual questions in my original query letter. I discovered this mistake when I began posting my query to message boards, and I received responses from individuals who seemed to think that this was going to be an extensive interview that would take a lot of time to complete. Had the individuals I sent the letter to seen that they needed to answer only four questions, my response rate might have been higher. Task No. 3: Post Query to Appropriate LinkedIn Groups and on Quora I posted my query to 13 different groups, but the only valid responses I received were from the Big Data and Analytics group. As I mentioned previously, I did receive responses in two other groups, but they were negative, with one individual chiding me that it would take a lot of time to find respondents, another asking why I could not simply Google to obtain the information I wanted, and another person telling me that I should build an online market research questionnaire. It was good that I posted to so many groups, even though most of them returned no results; like the query letter, this task was about numbers. I needed to get my questions in front of as many potential respondents as possible. I was surprised that the Quora question received only one valid answer. There is a very large data science community on Quora; the Data Science topic has over 56,500 followers (Quora, n.d.a), and the Data Mining Topic has over 65,000 followers (Quora, n.d.b). By its nature—Quora is a question-and-answer site—Quora attracts people who love to answer questions and share their knowledge. Each time I tweeted my Quora question, I received
  • 46. SOCIAL MEDIA DATA MINING 46 numerous re-tweets, so I am certain the question got sufficient exposure. I am stumped as to why I received only one valid answer, and it was from a Facebook friend. Task No. 4: Answer Respondents and Promote Query on Social Media as Needed I do not feel I could have done a better job promoting my query on social media; as I mentioned above, my Quora question was retweeted by numerous users, some of whom also followed me on Twitter. I used appropriate hash tags, such as #DataMining, so that my Tweet could easily be found. I had a great back-and-forth on the LinkedIn group, though looking back at my posts, my dismay at the long list of requirements some respondents gave me, and how long it would take to train myself on all of these things, was evident. Although I do not think I turned anyone off—the thread was very active—I feel I could have been less emotional. Task No. 5: Review Responses & Take Notes I did not expect to get so many responses; it was a lot of information to take in. It was overwhelming and, at first, frustrating. I was especially dismayed by one respondent’s comment about me being six months out from being able to get an entry-level job. I do not have six months to find a job, and I stated this in a response. As I discussed in the observation section, this prompted some comments about how what employers would like to have and what they can actually expect to get are often two different things, especially since this field is so new. My dismay was because I am used to competing in a very saturated job market with little demand and a nearly unlimited supply. I have spent a number of years working as a copywriter. The competition in this field is intense; there are dozens, if not hundreds of applicants for every open position, and applicants who do not meet 100% of the qualifications stated in an ad are wasting their time applying. However, in the data science world, there is a shortage of applicants
  • 47. SOCIAL MEDIA DATA MINING 47 (Violino, 2014, para. 1). I am not facing as much competition, and potential employers have to be more flexible with their requirements. I need to adjust my thinking and not become discouraged so easily. I also need to network better. The biggest problem I had when I began this iteration is that I have no real professional network. Most of the professionals I know are either attorneys or work in marketing and sales, not data science, not even information technology. Spending more time in relevant LinkedIn groups, especially Big Data and Analytics, would help me get to know other data professionals and build my network. Who I know might be more important than what I know. Third Iteration – Plan During this iteration, I will determine the specific skill sets needed to perform social media data mining, then go through a tutorial to help determine which skills I already have and which skills I need to obtain. The scheduled tasks in this iteration are as follows: Task No. 1: Perform Research on Skill Sets Neededfor Social Media Data Mining In this task, I will perform Internet research on the specific skill sets, including knowledge of Python, that are needed for social media data mining. I have set aside six hours to perform this task. To complete this task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goals are to determine which specific skill sets I need for social media data mining and what tutorials, if any, exist on the Internet. I already own a book on mining social media, and it includes tutorials, but I want to make sure this book is the best place for me to start. Do I have the requisite skills for this book? Are there better tutorials online?
  • 48. SOCIAL MEDIA DATA MINING 48 Task No. 2: Review Materials & Select a Tutorial to Work Through In this task, I will review all of the materials I collected in the first task and select a tutorial to work through. The expected duration is two hours. To complete this task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is to select a specific tutorial so that I can prepare my computer and get to work. Task No. 3: Install Python & Any Other Required Software on my Computer In this task, I will prepare my computer for the tutorial I selected during the previous task by installing Python and any other required software. The expected duration is four hours. To complete this task, I will need a computer with an Internet connection and appropriate time allocation. No other people will be involved at this stage. My goal is to get my machine ready for my selected tutorial. Task No. 4: Work Through the SelectedTutorial In this task, I will work through my selected tutorial. Hopefully, I can finish the tutorial before the end of this iteration; if I cannot, I will go as far as I can. To complete this task, I will need a computer with an Internet connection, appropriate time allocation, and my tutorial materials. No other people will be involved at this stage. I plan to spend at least eight hours on this task. I have two goals for this task: to learn as much as I can and to determine which gaps exist in my knowledge. I feel that working through a tutorial is the best way to accomplish these goals because computer science is much like mathematics in that you do not learn by reading, but by solving problems.
  • 49. SOCIAL MEDIA DATA MINING 49 Third Iteration – Action Task No. 1: Perform Research on Skill Sets Neededfor Social Media Data Mining I set aside six hours for this task, and my time estimate was accurate. I began by performing a Google search on the phrases “social media data mining” and “social media data mining getting started.” I followed approximately 20 links. Then, I performed a Google search on the phrases “how to learn Python” and “Python data analysis.” I followed approximately 15 links. Additionally, I reviewed five links that I had previously stored as bookmarks when I came across them during the two prior iterations, and I reviewed five links I found in email newsletters that I received from DataScienceCentral.com and AnalyticsVidyha.com. Task No. 2: Review Materials & Select a Tutorial to Work Through I estimated that I would need two hours to complete this task, but it took me three hours to complete. I reviewed all of the links I had bookmarked, re-reviewed the interviews I conducted in the last iteration, and brainstormed a game plan for the remainder of the iteration. I decided to use the book Mining the Social Web by Matthew A. Russell, which includes not only a text, but a GitHub repository of source code and a comprehensive set of tutorials to be completed using IPython Notebook. Task No. 3: Install Python & Any Other Required Software on my Computer I estimated I would need only four hours to complete this task, but it ended up taking me 8.5 hours. First, I installed Python 2.7 and three other software packages that I needed for it to work on my machine (MacPorts, XCode, and Tkinter). Then, I attempted to install the “virtual machine” that accompanied my social media mining book; however, I was unable to get the install to complete. After attempting some fixes and performing some additional research, I decided to install IPython Notebook on its own so that I could work the tutorial, and I ended up
  • 50. SOCIAL MEDIA DATA MINING 50 installing Anaconda, a Python distribution that includes, among other packages, IPython Notebook. Task No. 4: Work Through the SelectedTutorial I set aside at least eight hours for this task, and I ended up spending 11 hours on it. I finished the tutorial in chapter 1 of my book, then skipped ahead to chapter 9 and partially worked through that tutorial. Third Iteration – Observation Task No. 1: Perform Research on Skill Sets Neededfor Social Media Data Mining I spent six hours on this task, which was the time I had allotted to it. I researched both social media data mining skills and Python for data analysis. In addition to performing Google searches, I also reviewed some articles I received in email newsletters from the Data Science Central and Analytics Vidyha websites that appeared to be pertinent to this task, either because they discussed getting started in the data field or learning Python for data analysis. For each Google search, I used multiple keywords to filter the results so as to see the most relevant articles. I began by performing a Google search on the phrases “social media data mining” and “social media data mining getting started.” Similar to the interviews I conducted during the second iteration, most of the material I came across talked about data mining in general, not social media mining in particular. For example, Piatetsky (2013) outlined seven steps to learn data mining and data science: learn Python, R, and SQL; learn to use data mining and visualization tools such as KNIME, Rapid Miner, R Graphics, and Tableau; read textbooks on data mining and data science; educate yourself using webinars, online courses, or perhaps a data science degree; get to work with some
  • 51. SOCIAL MEDIA DATA MINING 51 sample data sets; participate in competitions on Kaggle.com; and network with other data scientists. Works (2014) noted that 88% of data professionals have at least a master’s degree, with the most common majors being mathematics and statistics, computer science, and engineering (“Technical Skills: Analytics”). In addition to proficiency with SAS and R (“Technical Skills: Analytics”), Works recommended that applicants in this field have strong Python coding skills, along with proficiency in Hadoop and SQL, and the ability to work with unstructured data (“Technical Skills: Computer Science”). Marr (2015b) wrote of the importance of the Python and R programming languages and pointed out that, when it comes to newer technologies such as Hadoop, Hive, and Pig, many data professionals are self-taught: A working knowledge of Python or R–two of the programming languages most commonly used for analyzing large digital datasets, is also usually expected. The biggest challenge can be finding candidates with experience in the most cutting edge analytics applications, such as those involving machine learning. Many people will not have the opportunity to learn this at school, and experts are often self-taught. (p. 2, para. 5) For both search terms, the book Mining the social web: Data mining Facebook, Twitter, Linkedin, Google+, Github, and more by Russell (2014) ranked high in the Google search results; the Amazon listing for the book was the first result on the first page for the search term “social media data mining” and the fourth result on the first page for the search term “social media data mining getting started.” Other references to the book, such as the corresponding GitHub repository and the author’s website (miningthesocialweb.com) also ranked high, on the first or second page for each search phrase, along with an interview of the author by De
  • 52. SOCIAL MEDIA DATA MINING 52 Lacvivier (2013), where he recommended Python as a starter language for social media mining because, in his opinion, it works well with the JSON data format, which he noted is used by many social media networks: Russell advocates using Python for first social data mining projects because its syntax is simple and its data structure is compatible with textual data. "Most social media properties are going to return data to you in JSON format," Russell explained. JSON (JavaScript Object Notation) is a flexible and intuitive text-based data format often used in Web environments in order to communicate both simple and complex data structures over a network. "Python's core data structures are so close to JSON that there's no real penalty for working with that data. It's very easy to make that request." (para. 3) Next, I performed Google searches on the phrases “how to learn Python” and “Python data analysis,” reviewed several links that I had previously stored as bookmarks when I came across them during the two prior iterations, and reviewed five links contained in email newsletters from DataScienceCentral.com and AnalyticsVidyha.com. Between these three sources, I followed approximately 25 links. Most of them suggested various tutorials, books, and online classes; there was a lot of duplicative information, and at least half were little more than thinly veiled advertisements for paid products (books or courses). Analytics Vidhya (n.d.) put together a “comprehensive learning path” for learning how to use Python for data science, which includes instructions on how to set up your machine (the site recommends using the Anaconda distribution) and links to numerous Python tutorials and online courses that cover everything from the basics of the language to data visualization and machine learning.