Building a Sentiment AnalyticsSolution Powered byMachine LearningAbstractThis white paper talks about the need for buildin...
Building a Sentiment Analytics solution powered by Machine Learning2Table of ContentsIntroduction............................
Building a Sentiment Analytics solution powered by Machine Learning3IntroductionSentiment Analytics or opinion mining refe...
Building a Sentiment Analytics solution powered by Machine Learning4Our experience has been that the cost of building reli...
Building a Sentiment Analytics solution powered by Machine Learning5Sentiment Analytics opens up a host of new opportuniti...
Building a Sentiment Analytics solution powered by Machine Learning6Learning and knowledge are central to intelligence and...
Building a Sentiment Analytics solution powered by Machine Learning7Understanding sentiment intensityThe third critical qu...
Building a Sentiment Analytics solution powered by Machine Learning8To understand this at a high level, one can pool in th...
Building a Sentiment Analytics solution powered by Machine Learning9You can employ the n-gram approach for text classifica...
Building a Sentiment Analytics solution powered by Machine Learning10by implementing an n-gram generated pattern which is ...
Building a Sentiment Analytics solution powered by Machine Learning11This pattern is processed through a Bayesian filter t...
Building a Sentiment Analytics solution powered by Machine Learning12If the feedback loop of the training data-sets is cut...
Building a Sentiment Analytics solution powered by Machine Learning13Apart from a higher degree of accuracy, our solution ...
Building a Sentiment Analytics solution powered by Machine Learning14Conducting Influencer Analytics alongside sentiment m...
Upcoming SlideShare
Loading in …5
×

Building a Sentiment Analytics Solution Powered by Machine Learning- Impetus White Paper

1,307 views

Published on

For Impetus’ White Papers archive, visit- http://www.impetus.com/whitepaper

This white paper focuses on why Sentiment Analysis is vital in today’s world, the existing solutions landscape and why Machine learning is recommended to build such a solution and gather better business insights.

Published in: Technology, Education
2 Comments
3 Likes
Statistics
Notes
No Downloads
Views
Total views
1,307
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
2
Likes
3
Embeds 0
No embeds

No notes for slide

Building a Sentiment Analytics Solution Powered by Machine Learning- Impetus White Paper

  1. 1. Building a Sentiment AnalyticsSolution Powered byMachine LearningAbstractThis white paper talks about the need for building a SentimentAnalytics solution based on Machine Learning. It focuses on whySentiment Analysis is vital in today’s world, the existing solutionslandscape and why Machine learning is recommended to buildsuch a solution and gather better business insights.n this white paper you will also learn about how to build aMachine Learning solution, and the benefits you can accrue fromit.Impetus Technologies, Inc.www.impetus.comW H I T E P A P E R
  2. 2. Building a Sentiment Analytics solution powered by Machine Learning2Table of ContentsIntroduction..............................................................................................3The four stages of Sentiment Analytics ....................................................3Existing landscape of Sentiment Analytics solutions................................4Challenges facing Sentiment Analytics .....................................................5Demystifying accuracy.................................................................5Isolating content types ................................................................5Sentiment override......................................................................5Machine Learning to the rescue ...............................................................5Sentiment versus subjectivity......................................................6Analyzing polarity reactions ........................................................6Understanding sentiment intensity.............................................7Building Sentiment Analytics Solution powered by Machine Learning...7How Machine Learning works .....................................................7Building a Sentiment Analytics tool.............................................8Creating a knowledge base........................................................10Leveraging Machine Learning for analyzing sentiments ...........10The Impetus solution.................................................................12The benefits offered by the Impetus solution...........................12Conclusion...............................................................................................13
  3. 3. Building a Sentiment Analytics solution powered by Machine Learning3IntroductionSentiment Analytics or opinion mining refers to a broad area of NaturalLanguage Processing (NLP), computational linguistics and text mining. Generallyspeaking, Sentiment Analytics determines the attitude of a speaker or a writerwith respect to a particular subject. The attitude may be based on the person’sjudgment, evaluation or the affective state—that is to say, the emotional stateof the author while writing. It can also be the intended emotionalcommunication—that is, the emotional impact that the author would want tohave on the reader.Sentiment Analytics can therefore be described as a discipline that helpsorganizations measure, evaluate and explain the performance of their socialmedia initiatives, based on the opinions or sentiments people express on socialportals, such as Twitter, Facebook, among others.The Four Stages of Sentiment Analytics1. The first and foremost event that acts like a trigger is the launch of acampaign on various communication channels.2. Once this happens, in the second stage, how the target audience reactsonline. Its reactions provide a collection of opinions and sentiments aboutthe campaign and the product.3. The next step is analyzing the sentiments, where the findings are presentedto the top management, strategists, and other stakeholders, such as thesales and marketing team. This visual and interactive information facilitatesbetter insights.A Sentiment Analytics solution is used here for gauging the performance ofa campaign, product, or a brand and to check how well received it is in themarketplace.4. Based on these insights and analysis, comes the innovation stage, wherebusinesses respond to what their target audience is saying about theirofferings, and accordingly, manage their reputation.It helps them identify how and what their existing customers are talkingabout and appropriately design a Customer Relationship Managementsystem. It also enables them to track changes in perception over time.In order to quantify the reactions and derive actionable insights, SentimentAnalytics is the key.
  4. 4. Building a Sentiment Analytics solution powered by Machine Learning4Our experience has been that the cost of building reliable storage usingcommodity hardware may be around USD 1 per gigabyte. This is only for storageand does not include the cost associated with managing, monitoring, andhosting the Big Data.Existing Landscape of Sentiment AnalyticsSolutionsSentiment Analytics is conventionally handled by a couple of populartechniques which are Natural Language Processing and Artificial Intelligence.Natural Language ProcessingIn the first option, the algorithm is trained to understand the natural languageand draw inferences from it. This option is usually not very successful as it isdeterred by factors such as internationalization as well as the language usedfor tweets/FB status updates, which are a lot different from the naturallanguage.Artificial intelligenceThe second option uses information by NLP and mathematical calculations todetermine negative, positive, or neutral sentiments. The Machine Learningsolution is a part of Artificial Intelligence.At Impetus, we believe thatthe biggest advantage ofcommodity hardware is thatyou can build it yourself, andthat there are many avenuesopen for innovation.Commodity hardware isreadily available and peoplehave easy access to it.Therefore, while usingcommodity hardware, youhave the option ofcustomizing and optimizing it,over and beyond the existingoffering.
  5. 5. Building a Sentiment Analytics solution powered by Machine Learning5Sentiment Analytics opens up a host of new opportunities and perspectives.However, before leveraging them, organizations need to overcome certainchallenges.Let us now check out the pluses and minuses of using Open Source and cloudcomputing. Obviously, using free and Open Source software to store, manage,and analyze Big Data is a good idea. We all know that Hadoop can be leveragedto tackle large volumes of data, while saving significantly on costs.Challenges Facing Sentiment AnalyticsDemystifying accuracyThe first challenge is the inability of machines to gauge and measure sentimentsaccurately. Writing an efficient algorithm can give a lower False Positive rate.This implies that a tweet, which was supposed to be classified as positive will beclassified as positive, and not as neutral or negative.In a Machine Learning-based system, the accuracy of a system is dependent onthe False Positive rate, which should ideally be lower than 4-5 percent,depending on the training data set and the information that is given to thealgorithm to learn.Isolating content typesThe next problem area is inconsistency across social networks and the neutralnature of social media. Take the instance of tweets. Tweets are usually 60percent neutral—most do not have opinions, or any explicit sentiments.It is also difficult to determine the Target Overlook Actual Verbatim, which iscaused by people re-tweeting or updating the same status again and again,thereby, diluting the sentiment’s overall strength.Sentiment overrideAnother problem is the inability to allow clients to leverage contextualknowledge in order to apply the accurate sentiment scores. Most of the currentsystems for example, cannot detect sarcasm.Machine Learning to the RescueThe challenges discussed can be addressed effectively by Machine Learning. It isa fact that computers need intelligence to cater to the needs of people.
  6. 6. Building a Sentiment Analytics solution powered by Machine Learning6Learning and knowledge are central to intelligence and Machine Learning takescare of this requirement.Machine Learning is a structure capable of acquiring and integrating knowledgeautomatically. The ability of machines to learn from experience, training,analytical observation, and other means creates an intelligent system that cancontinuously self-improve and thereby, exhibit efficiency and effectiveness.Having discussed the challenges associated with analyzing sentiments andestablished Machine Learning as the favored approach, it is also important todiscuss some of the critical questions that an ideal Sentiment Analytics toolneeds to address.They deal with subjectivity and sentiment, help analyze polarity of reactions andmake it possible to gauge sentiment intensity.Sentiment versus subjectivityAs mentioned earlier, tweets can be classified as positive, negative, and neutral.The question is how does a machine know about subjectivity and sentiment?Subjectivity is the linguistic expression of opinions, sentiments, emotions,evaluations, beliefs, and speculations.The basic components of Sentiment Analytics involve a sentiment holder, who isthe person or organization that holds a specific opinion on a particular focusarea; the object, on which a sentiment is expressed and lastly, the sentiment,which is a view, attitude, or appraisal on an object from a sentiment holder.A statistical and probabilistic approach can be used to define what is subjectiveand how to use it in a Sentiment Analytics.Analyzing polarity reactionsThe second big question is how does a analyze polarity of reactions in terms ofnegative and positive?Machine Learning plays a big role in deciding the polarity. Whether a sentence ispositive, negative, or neutral is judged by an algorithm, which refers to its initialknowledge bank to decide the polarity of a new sentence or a word.A knowledge bank is the database of some pre-trained words with informationon the patterns of the words and sentences defined as negative, neutral, andpositive.
  7. 7. Building a Sentiment Analytics solution powered by Machine Learning7Understanding sentiment intensityThe third critical question relates to how a machine knows about sentimentintensity. Whether a word or a sentence is strongly positive or strongly negativeis decided by a couple of metrics.A technique of benchmarking neutral is used where say 40-50 percent ofpositive is neutral, any intensity below that is negative, and above it is positive.Also, by referring to the knowledge bank, which is being continuously trained bya Machine Learning algorithm, the prediction becomes more accurate about theintensity. The occurrence of a word or a sequence of word in a particularpolarity decides the intensity of the overall sentiment.The accuracy of the system depends on the range and the complexity of thelanguage. Therefore, the wider the net is thrown, and the more difficult thelanguage gets, the less accurate the system is likely to be.Furthermore, it is easier to classify sentiments on the basis of positive andnegative, vis-à-vis the scenario where they are being further distributed on thebasis of exactness—excellent, incredible, good, and so on. Enhanced granularityrequires enhanced accuracy, and this in turn demands a deeper understandingof the human language.Building Sentiment Analytics SolutionPowered by Machine LearningHow Machine Learning worksHaving talked about how Machine Learning can help address the key problemareas, it is now important to focus on how the system actually works.
  8. 8. Building a Sentiment Analytics solution powered by Machine Learning8To understand this at a high level, one can pool in the input text in the form oftwitter tweets using Search APIs. Alternatively, it is possible to use Facebookgraph search results or any other REST API or XML output.It is additionally feasible to use a Machine Learning algorithm based on BayesProbability as well as text classification and pattern generation that uses the n-gram technique to process this information.The processed information generates results based on the initial knowledgebank, which is a trained set of n-gram pattern texts, classified as positive,negative, and neutral. The new information is processed and labeled aspositive, negative, or neutral.To ensure that there are no False Positives, there is a manual dashboard toreview and correct the polarity. Once a False Positive is found and corrected,the Machine Learning classifier re-runs and learns from the new training dataset and corrects its calculations for intensity and polarity prediction.Building a Sentiment Analytics toolBuilding a Sentiment Analytics tool, which leverages Machine Learningfacilitates low time complexities, without sacrificing performance. The first stepin building this tool is to come up with simple and effective methods.
  9. 9. Building a Sentiment Analytics solution powered by Machine Learning9You can employ the n-gram approach for text classification, which is usedfrequently to model a phenomenon in natural languages. It is possible todevelop two simple variations of this approach, which yield high performanceratios for filtering sentiments.The second step is exploiting human behavior to know its perception ofsentiments. Whenever a new text is received, the system will read the initialparts of the text and then decide whether the incoming text is heading towardsnegative or positive.In case the sentiment is negative, it is actually not required to read the entiretext to conclude the sentiment as negative, as just a quick glance can suffice.This human behavior is simulated by means of heuristics, which is referred to asthe first ‘n-words’ heuristics. Based on this, considering the first ‘n words’ of anincoming text and discarding the rest can yield the correct class.The process of tokenizing text into a word is different for every language andthere are numerous forms and tenses used in any language. For example—walk, walking, walked, caminar, caminó, and caminando.All these words represent the same meaning but since they have other writtenforms, they differ in terms of polar intensity. It is possible to address this issueYou can employ the n-gramapproach for text classification,which is used frequently to modela phenomenon in naturallanguages. It is possible todevelop two simple variations ofthis approach, which yield highperformance ratios for filteringsentiments.
  10. 10. Building a Sentiment Analytics solution powered by Machine Learning10by implementing an n-gram generated pattern which is a sub-sequence of ‘n’items from a given sequence, instead of words. N-gram primarily considers thesequence of alphabets or characters that make a word, rather than theirlanguage. Therefore, it is likely to succeed where the NLP-based solution fails.Creating a knowledge baseA knowledge base includes a database table with the information of n-grampatterns, identified as positive, negative, or neutral and generated from atraining dataset.The n-gram model can be used to store more contexts with a well-understoodspace–time tradeoff, enabling small experiments to scale up very efficiently.Therefore, the training data in a big data stack is feedback-loop enabled, whichmeans that the new classified n-gram patterns are labeled as positive, negative,or neutral. This data starts influencing the polarity intensity prediction model forthe existing pattern, making it a more accurate knowledge bank.For instance, a pattern generated for the word ‘rubbish’ will be treated asnegative and the algorithm will identify words like rubbished and rubbishing asnegative itself.Leveraging Machine Learning for analyzing sentimentsA tweet or a Facebook status text is first processed by an n-gram filter forpattern generation. If the pattern already exists, the pattern’s polarity intensityis increased or decreased based on the text classification. Else, a new pattern isgenerated and is labeled by its polarity, based on the existing training data.The two core advantages ofthe n-gram model are itsrelative simplicity and theability to scale up, by simplyincreasing the ‘n.’
  11. 11. Building a Sentiment Analytics solution powered by Machine Learning11This pattern is processed through a Bayesian filter that is based on the principlethat most events are dependent and the probability of an event occurring in thefuture can be inferred from the previous occurrences of that event.A probability value is then assigned to each word or a token. Now this value isbased on some calculations that take into account how often that word occursin one category or another. The most common application of the filter is toidentify words that appear in the negative sentiment category, versus thepositive sentiment category.This solution breaks down the content to improve the filter by supplementing itnot only with a database of words to categorize, but also sets of n-gram derivedfrom the text. The algorithm additionally helps with the extraction and offers afew more layers of depth for Bayesian filtering.Now based on the complete combination of pattern as well as the n-gram textand Bayes filter classification, the tweet is labeled as positive or negative. As theinformation is again treated as a new training data set, it is re-used to make theknowledge bank more intelligent by the algorithm.
  12. 12. Building a Sentiment Analytics solution powered by Machine Learning12If the feedback loop of the training data-sets is cut and the knowledge bank keptintact for a specific use case, it is possible to run it through n-gram filter andBayes classifier concurrently to get the results.Also, in the complete process of new pattern generation, if there is a FalsePositive, this can be manually corrected in the FP-Dashboard.The Impetus solutionImpetus has successfully adopted this approach and built a Sentiment Analyticssolution which is powered by Machine Learning and leverages Big Datatechnologies.Our solution intuitively retrieves the input text for analysis, using artificialintuition, which is a sophisticated function of an artifice that can interpret datain-depth and locate hidden factors.The solution is smart enough to change its source of information, jugglingbetween Twitter, Facebook, an XML file, Text file, etc., and filtering out thenoise from the content which holds the sentiment.It also offers an option to use the custom REST APIs for cross-functional teamsto build on top of the solution. It is an intuitive solution, capable of processingnear real-time data using the Big Data stack. The solution works on LAMP,HBASE, Hadoop, and PHP Thrift and has been tested for different scenarios.Our primary purpose of using Big Data is for analytics, and we performconcurrent processing to enable fast results with higher accuracy. Here, thelatency is more important than batching.Therefore, we recommend a combination of HBase and Hadoop along with theIn-Memory architecture, to leverage the huge unstructured data and providenear real-time insights. While developing this solution, we balancedincrementing counters in real time with Map Reduce jobs over the same data-set to ensure data accuracy.The benefits offered by the Impetus solutionHere are some of the main advantages that our solution offers vis-à-vis itscontemporaries.Impetus solution enablescustomers to handpickstatistics on demand to gainmarket insights and reactquickly to trends. This is allpossible by processing theHBase data on HDFS byHadoop, to convert it frombatch to near real-time.This approach brings it 80percent closer to near real-time analytics. Theremaining 20 percent willtake extra effort in the formof In-Memory solutions likeMemBase, GigaSpaces, orMemcached.
  13. 13. Building a Sentiment Analytics solution powered by Machine Learning13Apart from a higher degree of accuracy, our solution also helps identifyInfluencers.Say, after a campaign or a product launch, an enterprise wants to knowwho is talking about its offerings, where and how. If the sentiment islargely negative, the enterprise needs to neutralize the mentions thatmay hurt its brand the most.With our solution, the company can drill down into negative mentions,identify the content coming from the most influential people in theindustry, understand how far each tweet traveled, and how manypeople were impacted by this content.Including Influencer Analytics alongside sentiment measurement isbecoming a standard of the social media monitoring industry. It alsoenables reputation management, thereby taking the influencer concepta step further.As far as sentiment algorithms are concerned, part of a successfulprioritization process is to identify the intensity of each mention. “Ireally hate product X and will never buy it” is quite different from“Product X is running a little slow today.”This ability to cross-reference intensity, influence trajectory, velocity andsentiment of each social media mention drives us towards a reliable prioritysystem.ConclusionIn conclusion it can be said that a Sentiment Analytics Solution is used to gaugethe performance of a campaign, product, or a brand and to check how well ithas been received in the marketplace. Sentiment Analytics is conventionallyhandled by couple of popular techniques including NLP and ArtificialIntelligence.The neutral nature of social media mentions, Target Overlook Actual Verbatimand Sentiment Override challenges can be handled by building a MachineLearning model based on n-gram and the Bayes filter classification,A Machine Learning system has a certain level of knowledge and is associatedwith a corresponding knowledge-management organization which enables it tointerpret, analyze, and test the knowledge acquired.Apart from a higher degree of accuracy, it also helps identify influencers.
  14. 14. Building a Sentiment Analytics solution powered by Machine Learning14Conducting Influencer Analytics alongside sentiment measurement has becomethe norm for the social media monitoring industry. The solution that has theability to cross-reference intensity, influence trajectory, velocity, and sentimentof each social media mention, can help in the creation of a reliable prioritysystem.About ImpetusImpetus Technologies is a leading provider of Big Data solutions for theFortune 500®. We help customers effectively manage the “3-Vs” of Big Dataand create new business insights across their enterprises.Website: www.bigdata.impetus.com | Email: bigdata@impetus.com© 2013 Impetus Technologies,Inc. All rights reserved. Productand company names mentionedherein may be trademarks oftheir respective companies.May 2013

×