This project presents a general framework for sentiment analysis of Twitter data, by analyzing the typical public reaction towards health and well-being in Twitter media. The proposed framework is developed using Python, based on part-of-speech (POS) tagged bigrams. Tweets mentioning about common health issues are collected using NodeXL, a free and open-source network analysis tool. Extracted unstructured twitter data is preprocessed and a representative feature vector is generated for each tweet. A probabilistic classifier like Naïve Bayes is trained to determine the polarity and polarity score of the tweet.
This system presents three major outputs: automatic classification of a given tweet, analysis of the general public attitude as well as the top stories from that given set of tweets. Also it contains a module to track the most popular words or phrases in the feed related to a specific topic.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
Trans disciplinary research is a must for excellence in science by Prof. Moha...Prof. Mohamed Labib Salem
In this talk, Prof. Mohamed L. Salem presents the importance of having a center of excellence at each institute to enhance and foster scientific research and innovation.
Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale.
The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.
Trans disciplinary research is a must for excellence in science by Prof. Moha...Prof. Mohamed Labib Salem
In this talk, Prof. Mohamed L. Salem presents the importance of having a center of excellence at each institute to enhance and foster scientific research and innovation.
This year's 3rd Annual TCGC: The Clinical Genome Conference, held June 10-12, 2014 in San Francisco, is a three-day event that weaves together the science of sequencing and the business of implementing genomics in the clinic. It uniquely illustrates the mutual influence of those areas and the need to therefore consider the needs, challenges and opportunities of both - from next-generation sequencing and variant interpretation to insurance reimbursement and electronic health records - throughout the entire research process.Learn more at http://www.clinicalgenomeconference.com
Slides from the last live webcast that detailed the Online Bioinformatics Master's Degree and Bioinformatics Advanced Certificate programs offered at NYU Tandon School of Engineering.
Join us in Boston this coming Fall to attend Cambridge Healthtech Institute's (CHI) 2nd Annual FAST: Functional Analysis & Screening Technologies Congress on November 17-19, 2014 and meet with a community of 250+ biologists, screening managers, assay developers, engineers and pharmacologists dedicated to improving in vitro cell models and phenotypic screening to advance drug discovery and development at 6 conferences: Phenotypic Drug Discovery (Part I & II), Engineering Functional 3D Models, Screening and Functional Analysis of 3D Models, Organotypic Culture Models for Toxicology and Physiologically-Relevant Cellular Tumor Models for Drug Discovery. Delegates have the opportunity to share insights in interactive panel discussions and connect during networking breaks. View innovative technologies and scientific research revolutionizing early-stage drug discovery in the exhibit/poster hall.
Its my utmost belief that Kenya and other developing countries should be in the mainstream of adapting technology in excellent service delivery.
Veterinary Medicine applications of technology can improve education and service delivery.Here i highlight Informatics, Diagnostics,Biotechnology.Data analysis,Simualtion modelling and networks to outline policy changes for Kenya
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issu...CSCJournals
This is the third issue of volume four of International Journal of Biometric and
Bioinformatics (IJBB). The Journal is published bi-monthly, with papers being
peer reviewed to high international standards. The International Journal of
Biometric and Bioinformatics is not limited to a specific aspect of Biology but
it is devoted to the publication of high quality papers on all division of Bio in
general. IJBB intends to disseminate knowledge in the various disciplines of
the Biometric field from theoretical, practical and analytical research to
physical implications and theoretical or quantitative discussion intended for
academic and industrial progress. In order to position IJBB as one of the
good journal on Bio-sciences, a group of highly valuable scholars are serving
on the editorial board. The International Editorial Board ensures that
significant developments in Biometrics from around the world are reflected in
the Journal. Some important topics covers by journal are Bio-grid, biomedical
image processing (fusion), Computational structural biology, Molecular
sequence analysis, Genetic algorithms etc.
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Karin Verspoor
Human-generated text is a critical component of recorded clinical data, yet remains an under-utilised resource in clinical informatics applications due to minimal standards for sharing of unstructured data as well as concerns about patient privacy. Where we can access and analyse clinical text, we find that it provides a hugely valuable resource. In this talk, I will describe two projects where we have used text classification as the basis for addressing a clinical objective: (1) a syndromic surveillance project where the task is the monitoring of health and social media data sources for changes that indicate the onset of disease outbreaks, and (2) the analysis of hospital records to enable retrieval of specific disease cases, for monitoring of the hospital case mix as well as for construction of patient cohorts for clinical research studies. I will end by briefly discussing the huge potential for clinical text analysis to support changing the way modern medicine is practised.
American Association for Cancer Research Annual Meeting 2022
Analysis of images of routinely acquired tissue specimens promise to provide biomarkers that can be used to predict disease outcome and steer treatment, improve diagnostic reproducibility, and reveal new insights to further advance current human understanding of disease. The advent of AI and ubiquitous high-end computing are making it possible to carry out accurate whole slide image morphological and molecular tissue analyses at cellular and subcellular resolutions. AI methods are can enable exploration and discovery of novel diagnostic biomarkers grounded in prognostically predictive spatial and molecular patterns as well as quantitative assessments of predictive value and reproducibility of traditional morphological patterns employed in anatomic pathology. AI methods may be adapted to help steer treatment through integrative analysis of clinical information along with Pathology, Radiology and molecular data.
51_Introduction to Artificial Intelligence and its applications.pdfVamsi kumar
This course provides an in-depth understanding of the fundamentals, applications, and future trends of artificial intelligence (AI) in the field of medical lab technology. It covers the role of AI in clinical lab diagnostics, predictive analysis, big data interpretation, precision medicine, and ethical considerations in AI deployment. Through case studies, students will gain practical insights into the use of AI in healthcare.
Created by: Mr. Attuluri Vamsi Kumar, Assistant Professor, Department of MLT, UIAHS, Chandigarh University, Mohali, Punjab. For more details website: https://www.mltmaster.com
Electron JS | Build cross-platform desktop applications with web technologiesBethmi Gunasekara
Electron is an open source library developed by GitHub for building cross-platform desktop applications with HTML, CSS, and JavaScript. Electron accomplishes this by combining Chromium and Node.js into a single runtime. Apps can be packaged for Mac, Windows, and Linux.
Demo: https://github.com/bethmi/electron-mysql-demo
React is a declarative, efficient, and flexible JavaScript library for building user interfaces.
More Related Content
Similar to General Framework for Sentiment Analysis of Twitter Data, with Special Attention towards Improving Health Awareness - Final Year Research Project
This year's 3rd Annual TCGC: The Clinical Genome Conference, held June 10-12, 2014 in San Francisco, is a three-day event that weaves together the science of sequencing and the business of implementing genomics in the clinic. It uniquely illustrates the mutual influence of those areas and the need to therefore consider the needs, challenges and opportunities of both - from next-generation sequencing and variant interpretation to insurance reimbursement and electronic health records - throughout the entire research process.Learn more at http://www.clinicalgenomeconference.com
Slides from the last live webcast that detailed the Online Bioinformatics Master's Degree and Bioinformatics Advanced Certificate programs offered at NYU Tandon School of Engineering.
Join us in Boston this coming Fall to attend Cambridge Healthtech Institute's (CHI) 2nd Annual FAST: Functional Analysis & Screening Technologies Congress on November 17-19, 2014 and meet with a community of 250+ biologists, screening managers, assay developers, engineers and pharmacologists dedicated to improving in vitro cell models and phenotypic screening to advance drug discovery and development at 6 conferences: Phenotypic Drug Discovery (Part I & II), Engineering Functional 3D Models, Screening and Functional Analysis of 3D Models, Organotypic Culture Models for Toxicology and Physiologically-Relevant Cellular Tumor Models for Drug Discovery. Delegates have the opportunity to share insights in interactive panel discussions and connect during networking breaks. View innovative technologies and scientific research revolutionizing early-stage drug discovery in the exhibit/poster hall.
Its my utmost belief that Kenya and other developing countries should be in the mainstream of adapting technology in excellent service delivery.
Veterinary Medicine applications of technology can improve education and service delivery.Here i highlight Informatics, Diagnostics,Biotechnology.Data analysis,Simualtion modelling and networks to outline policy changes for Kenya
International Journal of Biometrics and Bioinformatics(IJBB) Volume (4) Issu...CSCJournals
This is the third issue of volume four of International Journal of Biometric and
Bioinformatics (IJBB). The Journal is published bi-monthly, with papers being
peer reviewed to high international standards. The International Journal of
Biometric and Bioinformatics is not limited to a specific aspect of Biology but
it is devoted to the publication of high quality papers on all division of Bio in
general. IJBB intends to disseminate knowledge in the various disciplines of
the Biometric field from theoretical, practical and analytical research to
physical implications and theoretical or quantitative discussion intended for
academic and industrial progress. In order to position IJBB as one of the
good journal on Bio-sciences, a group of highly valuable scholars are serving
on the editorial board. The International Editorial Board ensures that
significant developments in Biometrics from around the world are reflected in
the Journal. Some important topics covers by journal are Bio-grid, biomedical
image processing (fusion), Computational structural biology, Molecular
sequence analysis, Genetic algorithms etc.
Basics of Data Analysis in BioinformaticsElena Sügis
Presentation gives introduction to the Basics of Data Analysis in Bioinformatics.
The following topics are covered:
Data acquisition
Data summary(selecting the needed column/rows from the file and showing basic descriptive statistics)
Preprocessing (missing values imputation, data normalization, etc.)
Principal Component Analysis
Data Clustering and cluster annotation (k-means, hierarchical)
Cluster annotations
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Karin Verspoor
Human-generated text is a critical component of recorded clinical data, yet remains an under-utilised resource in clinical informatics applications due to minimal standards for sharing of unstructured data as well as concerns about patient privacy. Where we can access and analyse clinical text, we find that it provides a hugely valuable resource. In this talk, I will describe two projects where we have used text classification as the basis for addressing a clinical objective: (1) a syndromic surveillance project where the task is the monitoring of health and social media data sources for changes that indicate the onset of disease outbreaks, and (2) the analysis of hospital records to enable retrieval of specific disease cases, for monitoring of the hospital case mix as well as for construction of patient cohorts for clinical research studies. I will end by briefly discussing the huge potential for clinical text analysis to support changing the way modern medicine is practised.
American Association for Cancer Research Annual Meeting 2022
Analysis of images of routinely acquired tissue specimens promise to provide biomarkers that can be used to predict disease outcome and steer treatment, improve diagnostic reproducibility, and reveal new insights to further advance current human understanding of disease. The advent of AI and ubiquitous high-end computing are making it possible to carry out accurate whole slide image morphological and molecular tissue analyses at cellular and subcellular resolutions. AI methods are can enable exploration and discovery of novel diagnostic biomarkers grounded in prognostically predictive spatial and molecular patterns as well as quantitative assessments of predictive value and reproducibility of traditional morphological patterns employed in anatomic pathology. AI methods may be adapted to help steer treatment through integrative analysis of clinical information along with Pathology, Radiology and molecular data.
51_Introduction to Artificial Intelligence and its applications.pdfVamsi kumar
This course provides an in-depth understanding of the fundamentals, applications, and future trends of artificial intelligence (AI) in the field of medical lab technology. It covers the role of AI in clinical lab diagnostics, predictive analysis, big data interpretation, precision medicine, and ethical considerations in AI deployment. Through case studies, students will gain practical insights into the use of AI in healthcare.
Created by: Mr. Attuluri Vamsi Kumar, Assistant Professor, Department of MLT, UIAHS, Chandigarh University, Mohali, Punjab. For more details website: https://www.mltmaster.com
Similar to General Framework for Sentiment Analysis of Twitter Data, with Special Attention towards Improving Health Awareness - Final Year Research Project (20)
Electron JS | Build cross-platform desktop applications with web technologiesBethmi Gunasekara
Electron is an open source library developed by GitHub for building cross-platform desktop applications with HTML, CSS, and JavaScript. Electron accomplishes this by combining Chromium and Node.js into a single runtime. Apps can be packaged for Mac, Windows, and Linux.
Demo: https://github.com/bethmi/electron-mysql-demo
TestNG is a testing framework inspired from JUnit and NUnit, which can be used as a core unit test framework for Java project.
Demo: https://github.com/bethmi/testng-demo
“eLEAD” a Construction Industry Web portal, that provides all the information related to ongoing and upcoming opportunities in the field. The site developed using PHP, presents the latest details about a project and the vacancies available up-to-date along with their social plugins. An online inquiry desk has been created for the user, to solve any matter regarding an article.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
General Framework for Sentiment Analysis of Twitter Data, with Special Attention towards Improving Health Awareness - Final Year Research Project
1. General Framework for
Sentiment Analysis of Twitter Data
with Special Attention Towards
Improving Health Awareness
B. J. Gunasekara
Supervisor - Dr R. D. Nawarathna
2. Introduction
Social networking
encourages users to
express their ideas &
views on
their day-to-day life
style
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
2
3. Social Media Analytics
• The practice of gathering data from web
resources like blogs and social media and
analyzing that data
• Applications
Big Data Analysis
Survey & Marketing
Decision Making
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
3
4. Twitter
“To give everyone the
power to create and
share ideas and
information instantly,
without barriers”
4
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
5. 288 Million Monthly Active Users
500 Million Tweets Sent Per Day
152,000+ Tweets by Healthcare
professionals per Day
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
5
6. Tell your story with
140 characters
Textual content
User mentions
Hashtags
URLs
Location
Content of a tweet
6
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
7. Most of the tweets contain a less
informational value!!!
but a collection of tweets can
provide a
valuable insight into a
population
7
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
8. One voice can make a difference…
But a million can change the world!
#LetDoctorsBeDoctors #ChildhoodCancer
#BreastCancer
#digitalhealth
#ObesityCareWeek
# Parkinsons#Lyphoma
#Migraine
8
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
9. Importance of Improving Health
Literacy
• Maintain personal health & wellbeing
• Save on your medical costs
• Avoid Misinterpretations
chemo isn't so nice. Bad dreams
I really am surprised at how bad the side-effects are from
#chemo this time. It's taken me by surprise a bit. Not good.
hospitals are the worst!! hate the medicine like
smell lingering in the air why did my life become
so bad hate #chemo ahhh
Don't let chemotherapy take away your 'you‘ !!!
find your fab again with @Baldlybeautiful
My dads experimental chemo has officially stopped
his tumors from growing for an entire year now
9
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
10. Natural Language Processing
• NLP is the platform built to understand the linguistic
interaction between humans and computers.
• Main Tasks –
Information Extraction
Semantic Parsing
Text To 3D Scene Generation
Sentiment And Social Meaning
Machine Translation
Dialog And Speech Processing
Automatic Summarization
Text Segmentation
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
10
11. • Sentiment analysis is the extraction of
subjective information in a document using
NLP, text analysis and computer linguistics.
• Basic Tasks
Polarity classification
Subjectivity/objectivity identification
Feature/aspect-based sentiment analysis
Sentiment Analysis (Opinion Mining)
11
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
12. Related Work
• Language feature analysis
• Special frameworks
Autoregressive Moving Average (ARMA)
Latent Dirichlet allocation(LDA)
Ailment Topic Aspect Model (ATAM)
• Derivations from existing models
BioCaster Ontology,
an extant knowledge model of laymen’s terms
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
12
13. Problem Statement
• Perform a sentiment analysis which concerns
on improving health awareness,
by analyzing the typical public reaction to
common illnesses and treatments in Twitter
community.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
13
14. Methodology
• The proposed method is based on POS Tagged
Bigrams with Naïve Bayes Classifier
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
14
16. Feature Extraction
• “200 lives were lost, coz of this massive
dengue outbreak “Tweet
• ['lives', 'lost', 'coz', 'massive', 'dengue',
'outbreak']Unigrams
• ['lives_lost', 'lost_coz', 'coz_massive',
'massive_dengue', 'dengue_outbreak']Bigrams
• [('lives', 'NNS'), ('were', 'VBD'), ('lost',
'VBN'), ('coz', 'NN'), ('of', 'IN'), ('this', 'DT'),
('massive', 'JJ'), ('dengue', 'NN'),
('outbreak', 'NN')]
POS tagging
16
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
17. Bigram vs. Unigram
• The frequency distribution of bigrams in a
string is used for simple statistical analysis of
text.
• Unlike unigrams, bigrams suggest another
word (increased long-tail specificity )
• Classifier has more contexts to predict the
label than relying on single word.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
17
18. POS Tagging
• The process of labeling the particular part of
speech of a word with respect to its definition,
as well as its context.
• Mainly nouns & adjectives were considered.
• Adjectives can modify a noun to add value, to
add better sense.
Penn Treebank
Brown Corpus
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
18
19. • Based on Bayes Theorem
• It assumes that the probability of each attribute
belonging to a given class value is independent
of all other attributes and probabilities of each
attribute belonging to each class.
• Ideal for categorical data – easy to calculate
using ratios.
Naïve Bayes classifier
19
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
20. System Implementation
• Python 3.4
Operator - Functional interface to built-in operators.
Itertools - Numeric and Mathematical Modules
Re - Searching within and changing text using formal
patterns.
• NLTK
Probability - Classes for representing and processing
probabilistic information
Classify - Classifiers
Metrics - Testing & validation
• Matplotlib & Pylab
• Tkinter
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
20
21. Experimental Setup
• Specific health topics, illnesses and treatments
were selected using WebMD and Mayo Clinic
• Tweets related to those issues were collected
using NodeXL tool.
• Data was collected over a period of time to
ensure that it does not contain any strange
outliers.
• Training sets
– the datasets were distributed within groups with 10
people in each and the label of a tweet was
assigned according to the tag chosen by
the majority.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
21
22. • Both Naïve Bayes and Maximum Entropy
classifiers were used.
• Experiments were carried trying out for
different combinations of bigram/unigram,
with part-of-speech (POS) tagging.
• The performance was evaluated with
cross validation.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
22
23. Datasets
23
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
Name
Content
(keywords)
# From To Classified
Polarity Ratio
(Negative:Positive)
Dengue Dengue 472
27/04/2015
20:29
1/7/2015
15:14
Yes 323:149
H1N1 H1N1, Influenza 548
24/06/15
1:45
30/06/15
14:57
Yes 314 : 234
Chemo-I Chemotherapy 170
12/10/15
7:12
22/10/15
14:37
Yes 72 : 98
Chemo-II Chemotherapy 734
12/10/2015
12:04
22/10/15
14:37
No -
24. Experiment 1: Dengue Dataset
Dengue, Dengue Vaccine
Naïve Bayes MaxEnt
Uni
grams
Bi
grams
POS-
Tagged
Bigrams
Uni
grams
Bi
grams
POS-
Tagged
Bigrams
Accuracy 72.52 75.50 81.82 68.68 70.32 76.06
Weighted
Precision
74.26 74.40 81.69 72.42 65.91 61.28
Weighted
Recall
70.70 73.77 82.26 67.30 70.82 57.70
Weighted
F-measure
70.90 71.00 79.84 67.55 60.57 58.72
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
24
25. Accuracy
25
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
60.00
65.00
70.00
75.00
80.00
85.00
Naïve Bayes Maximum Entropy
Unigrams
Bigrams
POS-Tagged Bigrams
26. Weighted F-measure
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
Naïve Bayes Maximum Entropy
Unigrams
Bigrams
POS-Tagged Bigrams
26
27. Experiment 2: H1N1 Dataset
H1N1,Influenza
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
27
Naïve Bayes
Unigrams Bigrams POS-Tagged Bigrams
Accuracy
67.43 70.59 76.04
Weighted Precision 67.52 70.62 76.09
Weighted Recall 67.95 70.44 76.05
Weighted F-measure 65.69 70.08 75.78
28. Experiment 3: Chemo-I Dataset
Chemotherapy
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
28
Naïve Bayes
Unigrams Bigrams POS-Tagged Bigrams
Accuracy 75.88 76.47 78.24
Weighted Precision 78.23 78.66 79.96
Weighted Recall 75.10 75.60 77.16
Weighted F-measure 75.69 76.25 77.93
29. Polarity Checker : Dataset Analysis
29
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
30. Polarity Checker : Top Stories
Positive Negative
1
#Dengue News: Scientists identify the
skin immune cells targeted by the
dengue virus
United Nations News Centre - At least
3,000 suspected Dengue fever cases
reported in Yemen – UN health agency:
2
Co-ordination meet of BBMP Health and
edu. dept. regarding control and
prevention of Dengue and Chikungunya
fever spread by Mosquito bite. (1/5)
#MyiTimes Country faces largest dengue
epidemic ever - KUALA LUMPUR: The
country is probably facing the largest
dengue problem
3
Well that's a 1st! Malaysia Dept of Health
officials doing house to house calls
looking for dengue hot spots!!
Clean bill of health here!
#Dengue News: Country faces largest
dengue epidemic ever - Free Malaysia
Today
4
@PascalBarollier Fantastic! Thanks for
helping our tribe put a face to dengue
global leaders won't forget.
Country faces largest dengue epidemic
ever: The number of deaths has doubled
this year compared to the same period…
5
@DengueInfo Thank you for helping us
get the word out on Dengue Tribe! To
help put a face to dengue, join here
#Yemen Yemen: At least 3,000 suspected
Dengue fever cases reported in Yemen –
UN health agency says 30
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
31. Polarity Checker : Text Analysis
31
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
33. Buzzmeter : Unigram vs. Bigram
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
33
34. Buzzmeter : Unigram vs. Bigram
• Chemo radiation
• Breast cancer
• Last chemo
• Cancer awareness
34
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
35. Conclusion
• This research presents a sentiment analysis with
special attention towards improving health
awareness.
automatic classification of a given tweet
generate the general attitude from a given set of
tweets, with top stories.
track most commonly used words/phrases in health
related tweets
• POS-tagged bigrams using nouns + adjectives
with Naive Bayes method produced the
best overall performance.
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
35
36. Future Recommendations
• Real-time Twitter data analyzing
• Web plug-ins
• Mobile apps
• Identifying pattern of spreading of a disease,
threatened areas & age groups
• Health alerts/warnings system
Department of Stat. & Comp. Sc., Faculty of
Science, University of Peradeniya
36