Weekly Meeting 4.pdf

Weekly Meeting - XN 1
Informatics Students
Lingyu Hu, Zhaode Ouyang, Ziyan Yan, Zixun Zhou
Feb 16, 2022

Twitter Transparency Project datasets
● Find some similarities
● What language is the content in?
● Tried to translate content into English (Fail)

Twitter Transparency Project datasets
● Used diﬀerent languages
● Use hashtags to spread content

Filter by country under Covid's topics

Interesting Found
I found out that most of the fake news about
Covid is happening in the ﬁrst quarter of 2020
Since then there have been very few false news
stories about local outbreaks on Chinese social
media

Data Cleaning - SBS Ready Format
Works:
- Change separator from ‘,’ to ‘|’
- Remove ‘|’ in text column
- Remove line brakes
- 50GB+ data on Python
- .csv .json .zst
- UTF-8 ready
Not works:
- Output exact 30MB size on each ﬁle
- Separated ﬁle on seperate language
- 1,048,576+ rows on Excel
- 10GB+ data on Excel

SBS - Analysis Can Run, But No Result

SBS - Working
WordCloud
Keyword Extractor

Sentiment Analysis——Preliminary exploration
Tool——Azure Machine Learning
● One English dataset
● Two Chinese datasets——Abnormal results

A dataset of real news about COVID-19——Chinese

A dataset of fake news about COVID-19——Chinese

A dataset of disinformation about COVID-19——English

Questions
1. What is the relationship between sentiment analysis of data and
our project？
2. With the results of the sentiment analysis, what do we do next？
1. Use existing data to train and build judgment models.
Future Planning

Weekly Meeting 4.pdf

Recommended

Recommended

More Related Content

Similar to Weekly Meeting 4.pdf

Similar to Weekly Meeting 4.pdf (20)

More from ZixunZhou

More from ZixunZhou (6)

Recently uploaded

Recently uploaded (20)

Weekly Meeting 4.pdf