Presentation1.pdf

- very brief introduction to your project. (Note, this is to help your classmates refresh their memory about your project, which should be very
short, one or two sentences highlight) Jerry
Slide 2 -- all research methods you have used to complete this project(for each method, use justone sentence to justify why it's necessary to adopt this
method)
Slide 3 -- all data you havecollected (a list of types of the data, including number of the email correspondence, number of the interviews, pages of
documents you reviewed, etc.) {平台，数据仓库（github、kaggle）和canvasoet里提供的数据（covid，ukrane，伊朗），调查平台数据是否可
用}（Ziyan），爬数据方法（jerry）
清理数据 (Jerry)，数据可视化分析（SBS(lingyu)，power bi（））
Slide 4 ~ x -- final productdesign process (This should be the focus, tell us how your interaction with the sponsors, users, etc. informed your design
thinking, and how you came up with the design ideas) 每周任务，不过也可以放我们的dashboard设计，如何沟通 (lingyu)
Slide x+1 ~ y -- how your final productlooks like? (Note, a list of screen shots would be helpful, or a live demo but make sure your designed websiteworks
properly. You only have about 20 minutes in total, so don't wastetime on searching, finding, or fixing websitepages) 介绍wireframe（Jerry）
Slide y+1 -- Take away points (Whatyou havelearned from doing this project, shareyour valuable experiences)（ziyan）
Final slide -- if you are given an opportunity to re-do the project, whatmay you change??? (Jerry）

ITC 6040 Capstone
Final Report
Strategies for Identifying
Mis-/Disinformation
Team 2：
Ouyang Zhaode, Lingyu Hu, Ziyan Yan, Zixun Zhou

Mikhail Oet, PhD
Professor in Commerce and Economic
Development (CED) program
Northeastern University
Our Sponsors:
Mission:
To get the rightinformation to the
right people at the right time
Research
• PlatformResearch
• Data Repositories Research
• Data Scraping Methods Research
• U.S., China, Russian Research
DataVisualization
• Dashboard Design
DataAnalysis
• Data Cleaning
• Sentiment Analysis
• Word Semantic Analysis
What Are We Doing? Help Identifying Fake News
How We Identify?

Research Method---
Qualitative Research
1. Identify Research Questions：
• How to collect data?
• How to use a data repository?
• How to analyze a dataset?
2. Case Study
3. Research Report

Research Method---
Quantitative Research
Specific method: Data Analysis
Analyze objective data---Statistical Data
• Sentiment score
• Information release time
• Amount of information
• Location

Data Source (1)
Gi t Hub
❑ Provider of hosting program and it could offer the research results of fake news
❑ The results were not used, but we use the dataset
❑ Use the keywords
❑ Datasetsourced from Weibo about the false information of COVID-19
K a ggl e
❖ Datasetwebsite owned by Google
❖ Offer scientific topics
❖ Provides data on the issue of fake news about COVID-19

Data Source (2)
01
COVID-19
Source: Weibo, Twitter
Topic: Misinformation of COVID-19
02 Ukraine
Event Registry Ukraine-English Dataset
Event Registry Ukraine-Russian Dataset
From Twitter
03 Iran
The theme of Iran will be from Tweet by inputting keywords

Feasibility of Data
01 02
03 04
Highly Feasible Diverse
Visualize Reliable

Data We Collected
Qualitative
Quantitative
Primary
Secondary
Information gather from the guest speakers &
stakeholders(professors, sponsors, and other teams)
Articles and reports we read
Data we scrap on the social media and news websites
Data our sponsor provided&
data repositories we found
Primary
Secondary
COUNT
3 Guest Sections
10+Zoom Recording
15+ Meetings
30+ Emails
30+ Articles,
Reports & Videos
3 Experiment
Web Scrapings
5+ Data
Repositories We
Found

Data Scraping Methods Research
RSS feed to CSV (Online Converting Tools) DataCollectors (Octopus, BrightData)
Web Scraping (Python)
Methods Use Cases Difficulty
RSS feed to CSV Websites Providing RSS Feeds Low
Data Collectors PopularSocial Medias Medium
Web Scraping Static Websites High

Part 4
Data Cleaning & Analysis

Data Cleaning
MS Excel Power Query
• For CSV format
• Easy-to-use
• For ad-hocanalysis (One-time use)
* Limitations:
• Data should less than 1 million row
• Data should less than 1GB
Python
• For JSON or other data format
• Can cut a large data into many smaller files
• Cleaning as scale
• For data pipeline use (continuously data streaming)
* No limitation, but take more time and more effort
We can use
Google Sheet
to do batch
translation

Power BI
1. Data visualization
2. Data query
3. Data Modeling
4. Key data analysis

SBS
•Data visualization tool
•Words data analysis

Ukraine – Russian
SBS Analysis
• All words content are Russian
• All records are news
• “Russian” and “Ukraine” appeared mostin
the dataset
• Specific words do not appear too much

Ukraine – Russian SBS Analysis
• Ukraine, Russian, and Putin care Topic 1 most
• NATO, Russian, and USA care Topic 2 most
• Xi and Biden care about Topic 5 most

Ukraine – Russian SBS Analysis
• T6 has a strong relationship with T2
• T5 has the second strong relationship
with T2
Conclusions:
• Ukraine, Russian and Putin care Winter Olympicsmost
• Russian, NATO and USA care potentialmilitaryactivities
• Xi and Biden care relationshipwith other countries
• Covid has stronger relationshipwith potentialmilitary activities
• Relationship incountries could influence the potential militaryactivities

Twitter Transparency
Project
Power BI Analysis
• [Hanya Kamu]:Only you
• Most hashtags are meaningless
• Tweet numbers in 2012
• Most tweets appeared in June and August
• Trend is unstable

User Input
External Data - Revenpack
News Articles
Dashboard
Fake Score
Sentiment
polyfact
propublica
Local Check
Based on
Historical Data
External Data - GDI
Contribution by Country

Dashboard - Data Prerequisite
News Article Dataset
External Data

Take Away (1)
Data visualization

Take Away (2)
01 02 03
Communication Diversity Identifying Theme

Data Collection
Plan
1. Develop a Data Collection Plan
2. Us
1. What are we going to solve?
e.g., A list of issues
2. What consider success?
e.g., A Service Level Agreements (SLA)
3. What dataavailable?
4. What form does that data come in?
5. Where the datawill be collectedfrom?
6. Whether to measure a sample or the whole population?
7. What format the datawill be displayed?
We Did
We Missed

Project
Management
Techniques
Tools:
Methods:
• Waterfall
• Agile
• Scrum
…

Week 1- 2 : Platform Exploration
Exploring platforms where the data can be crawled.
Platforms in Russian, English and Chinese.
If possible, crawl the data by learning new tools.

Week 3 – 4: Learning New Tools
Clean and filter the data.
1
Learn new tools: SBS (and its format),
Power BI, sentiment analysis
2
Provide some basic findings.
3

Week 5 – 6: Data Visualization
• Use data visualization tools to analyze
data
• Determine the final tools: SBS, Power
BI.
• Providing some findings.

Week 7 – 8: Dashboard learn and design
• Keep using SBS to analyze the data
• Design dashboard by learning from
Ravenpack
• Combine and provide sample
designed dashboard
Sample designed dashboard

Process to Final Product
1. Ask Questions
2. Create
3. Feedback
4. Research
5. Revise
6. Feedback
7. Continue Revise

TwitterTransparencyDatasets
WebScrapping
Data Sources
HistoricalData
WeiboDatasets
EventRegistry Datasets
Live Data
TwitterAPI
Process
Storage Consume
Data Analysis & Query
Machine Learning
Cloud Architecture

Data Sources
WebScrapping
HistoricalData
WeiboDatasets
Live Data
TwitterAPI
Storage

Process
Storage
Data Analysis & Query
Machine Learning
A Database
of Metadata

Machine Learning
• Existing fake news prediction model
• Can train non-relational data (video, image, audio)
• Low-Code

Cleaning
Machine
LearningFake
Score
Sentiment
Data Visualization
Clean
& Transform
WebScrapping
Data Sources
HistoricalData
WeiboDatasets
Live Data
Process Consume
Transfer
TwitterAPI

Presentation1.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Presentation1.pdf

Similar to Presentation1.pdf (20)

More from ZixunZhou

More from ZixunZhou (7)

Recently uploaded

Recently uploaded (20)

Presentation1.pdf