SlideShare a Scribd company logo
Slide 1 -- very brief introduction to your project. (Note, this is to help your classmates refresh their memory about your project, which should be very
short, one or two sentences highlight) Jerry
Slide 2 -- all research methods you have used to complete this project(for each method, use justone sentence to justify why it's necessary to adopt this
method)
Slide 3 -- all data you havecollected (a list of types of the data, including number of the email correspondence, number of the interviews, pages of
documents you reviewed, etc.) {平台,数据仓库(github、kaggle)和canvasoet里提供的数据 (covid,ukrane,伊朗),调查平台数据是否可
用}(Ziyan),爬数据方法(jerry)
清理数据 (Jerry),数据可视化分析(SBS(lingyu),power bi())
Slide 4 ~ x -- final productdesign process (This should be the focus, tell us how your interaction with the sponsors, users, etc. informed your design
thinking, and how you came up with the design ideas) 每周任务,不过也可以放我们的dashboard设计,如何沟通 (lingyu)
Slide x+1 ~ y -- how your final productlooks like? (Note, a list of screen shots would be helpful, or a live demo but make sure your designed websiteworks
properly. You only have about 20 minutes in total, so don't wastetime on searching, finding, or fixing websitepages) 介绍wireframe(Jerry)
Slide y+1 -- Take away points (Whatyou havelearned from doing this project, shareyour valuable experiences)(ziyan)
Final slide -- if you are given an opportunity to re-do the project, whatmay you change??? (Jerry)
ITC 6040 Capstone
Final Report
Strategies for Identifying
Mis-/Disinformation
Team 2:
Ouyang Zhaode, Lingyu Hu, Ziyan Yan, Zixun Zhou
PART 1
PROJECT INTRODUCTION
Mikhail Oet, PhD
Professor in Commerce and Economic
Development (CED) program
Northeastern University
Our Sponsors:
Mission:
To get the rightinformation to the
right people at the right time
Research
• PlatformResearch
• Data Repositories Research
• Data Scraping Methods Research
• U.S., China, Russian Research
DataVisualization
• Dashboard Design
DataAnalysis
• Data Cleaning
• Sentiment Analysis
• Word Semantic Analysis
What Are We Doing? Help Identifying Fake News
How We Identify?
Part 2
Research Method
Research Method---
Qualitative Research
1. Identify Research Questions:
• How to collect data?
• How to use a data repository?
• How to analyze a dataset?
2. Case Study
3. Research Report
Research Method---
Quantitative Research
Specific method: Data Analysis
Analyze objective data---Statistical Data
• Sentiment score
• Information release time
• Amount of information
• Location
Part 3
Data Collection
Data Source (1)
Gi t Hub
❑ Provider of hosting program and it could offer the research results of fake news
❑ The results were not used, but we use the dataset
❑ Use the keywords
❑ Datasetsourced from Weibo about the false information of COVID-19
K a ggl e
❖ Datasetwebsite owned by Google
❖ Offer scientific topics
❖ Provides data on the issue of fake news about COVID-19
Data Source (2)
01
COVID-19
Source: Weibo, Twitter
Topic: Misinformation of COVID-19
02 Ukraine
Event Registry Ukraine-English Dataset
Event Registry Ukraine-Russian Dataset
From Twitter
03 Iran
The theme of Iran will be from Tweet by inputting keywords
Feasibility of Data
01 02
03 04
Highly Feasible Diverse
Visualize Reliable
Data We Collected
Qualitative
Quantitative
Primary
Secondary
Information gather from the guest speakers &
stakeholders(professors, sponsors, and other teams)
Articles and reports we read
Data we scrap on the social media and news websites
Data our sponsor provided&
data repositories we found
Primary
Secondary
COUNT
3 Guest Sections
10+Zoom Recording
15+ Meetings
30+ Emails
30+ Articles,
Reports & Videos
3 Experiment
Web Scrapings
5+ Data
Repositories We
Found
Data Scraping Methods Research
RSS feed to CSV (Online Converting Tools) DataCollectors (Octopus, BrightData)
Web Scraping (Python)
Methods Use Cases Difficulty
RSS feed to CSV Websites Providing RSS Feeds Low
Data Collectors PopularSocial Medias Medium
Web Scraping Static Websites High
Part 4
Data Cleaning & Analysis
Data Cleaning
MS Excel Power Query
• For CSV format
• Easy-to-use
• For ad-hocanalysis (One-time use)
* Limitations:
• Data should less than 1 million row
• Data should less than 1GB
Python
• For JSON or other data format
• Can cut a large data into many smaller files
• Cleaning as scale
• For data pipeline use (continuously data streaming)
* No limitation, but take more time and more effort
We can use
Google Sheet
to do batch
translation
Power BI
1. Data visualization
2. Data query
3. Data Modeling
4. Key data analysis
SBS
•Data visualization tool
•Words data analysis
Part 5
Analysis and findings
Ukraine – Russian
SBS Analysis
• All words content are Russian
• All records are news
• “Russian” and “Ukraine” appeared mostin
the dataset
• Specific words do not appear too much
Ukraine – Russian SBS Analysis
• Ukraine, Russian, and Putin care Topic 1 most
• NATO, Russian, and USA care Topic 2 most
• Xi and Biden care about Topic 5 most
Ukraine – Russian SBS Analysis
• T6 has a strong relationship with T2
• T5 has the second strong relationship
with T2
Conclusions:
• Ukraine, Russian and Putin care Winter Olympicsmost
• Russian, NATO and USA care potentialmilitaryactivities
• Xi and Biden care relationshipwith other countries
• Covid has stronger relationshipwith potentialmilitary activities
• Relationship incountries could influence the potential militaryactivities
Twitter Transparency
Project
Power BI Analysis
• [Hanya Kamu]:Only you
• Most hashtags are meaningless
• Tweet numbers in 2012
• Most tweets appeared in June and August
• Trend is unstable
Part 6
Final Product
User Input
External Data - Revenpack
News Articles
Dashboard
Fake Score
Sentiment
polyfact
propublica
Local Check
Based on
Historical Data
External Data - GDI
Contribution by Country
Dashboard - Data Prerequisite
News Article Dataset
External Data
Part 7
Takeaway Point
Take Away (1)
Data visualization
Take Away (2)
01 02 03
Communication Diversity Identifying Theme
Part 8
Re-design Project
Data Collection
Plan
1. Develop a Data Collection Plan
2. Us
1. What are we going to solve?
e.g., A list of issues
2. What consider success?
e.g., A Service Level Agreements (SLA)
3. What dataavailable?
4. What form does that data come in?
5. Where the datawill be collectedfrom?
6. Whether to measure a sample or the whole population?
7. What format the datawill be displayed?
We Did
We Missed
Project
Management
Techniques
Tools:
Methods:
• Waterfall
• Agile
• Scrum
…
The End
Appendix
Week 1- 2 : Platform Exploration
Exploring platforms where the data can be crawled.
Platforms in Russian, English and Chinese.
If possible, crawl the data by learning new tools.
Week 3 – 4: Learning New Tools
Clean and filter the data.
1
Learn new tools: SBS (and its format),
Power BI, sentiment analysis
2
Provide some basic findings.
3
Week 5 – 6: Data Visualization
• Use data visualization tools to analyze
data
• Determine the final tools: SBS, Power
BI.
• Providing some findings.
Week 7 – 8: Dashboard learn and design
• Keep using SBS to analyze the data
• Design dashboard by learning from
Ravenpack
• Combine and provide sample
designed dashboard
Sample designed dashboard
Process to Final Product
1. Ask Questions
2. Create
3. Feedback
4. Research
5. Revise
6. Feedback
7. Continue Revise
TwitterTransparencyDatasets
WebScrapping
Data Sources
HistoricalData
WeiboDatasets
EventRegistry Datasets
Live Data
TwitterAPI
Process
Storage Consume
Data Analysis & Query
Machine Learning
Cloud Architecture
Data Sources
TwitterTransparencyDatasets
WebScrapping
HistoricalData
WeiboDatasets
EventRegistry Datasets
Live Data
TwitterAPI
Storage
TwitterTransparencyDatasets
WebScrapping
Data Sources
HistoricalData
WeiboDatasets
EventRegistry Datasets
Live Data
TwitterAPI
Process
Storage Consume
Data Analysis & Query
Machine Learning
Cloud Architecture
Process
Storage
Data Analysis & Query
Machine Learning
A Database
of Metadata
Machine Learning
• Existing fake news prediction model
• Can train non-relational data (video, image, audio)
• Low-Code
TwitterTransparencyDatasets
Cleaning
Machine
LearningFake
Score
Sentiment
Data Visualization
Clean
& Transform
WebScrapping
Data Sources
HistoricalData
WeiboDatasets
EventRegistry Datasets
Live Data
Process Consume
Transfer
TwitterAPI
Amazon S3

More Related Content

What's hot

MUDROD - Ranking
MUDROD - RankingMUDROD - Ranking
MUDROD - Ranking
Yongyao Jiang
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary Defense
Yongyao Jiang
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
Sameera Horawalavithana
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
Ben Blaiszik
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Bertram Ludäscher
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
Trey Grainger
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
台灣資料科學年會
 
BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5BDACA1516s2 - Lecture5
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
Yi-Shin Chen
 
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Myriam Traub
 
Braintalk cuso nm
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nm
eXascale Infolab
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...
Nattiya Kanhabua
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
kevig
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
Trey Grainger
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013Luis Daniel Ibáñez
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
Krishna Sankar
 

What's hot (20)

MUDROD - Ranking
MUDROD - RankingMUDROD - Ranking
MUDROD - Ranking
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary Defense
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a  Large Newspaper CorpusQuerylog-based Assessment of Retrievability Bias in a  Large Newspaper Corpus
Querylog-based Assessment of Retrievability Bias in a Large Newspaper Corpus
 
Braintalk cuso nm
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nm
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7
 
Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...Exploiting temporal information in retrieval of archived documents (doctoral ...
Exploiting temporal information in retrieval of archived documents (doctoral ...
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2
 
BDACA - Lecture7
BDACA - Lecture7BDACA - Lecture7
BDACA - Lecture7
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 

Similar to Presentation1.pdf

Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdf
ZixunZhou
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
Piet J.H. Daas
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
Jisc RDM
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
neelakandan2001kpm
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
Experfy
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
DATA CAPTURING TRAINING_FINAL.pptx
DATA CAPTURING TRAINING_FINAL.pptxDATA CAPTURING TRAINING_FINAL.pptx
DATA CAPTURING TRAINING_FINAL.pptx
scokoye
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Thinkful
 
I2DS Project.pdf
I2DS Project.pdfI2DS Project.pdf
I2DS Project.pdf
AbdulnasserAlMaqrami
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
C. Tobin Magle
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
Thinkful
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
"Open Access - Open Data" conference, 13th/14th December, 2010
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
Brand Niemann
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
Marquis Cabrera
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
elisarosa29
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsfBrad Houston
 

Similar to Presentation1.pdf (20)

Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdf
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdfData+Science+in+Python+-+Data+Prep+&+EDA.pdf
Data+Science+in+Python+-+Data+Prep+&+EDA.pdf
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Data management plans
Data management plansData management plans
Data management plans
 
DATA CAPTURING TRAINING_FINAL.pptx
DATA CAPTURING TRAINING_FINAL.pptxDATA CAPTURING TRAINING_FINAL.pptx
DATA CAPTURING TRAINING_FINAL.pptx
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
I2DS Project.pdf
I2DS Project.pdfI2DS Project.pdf
I2DS Project.pdf
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...Martin Rasmussen: Ensuring availability and quality of research data through ...
Martin Rasmussen: Ensuring availability and quality of research data through ...
 
Data management plans
Data management plansData management plans
Data management plans
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 

More from ZixunZhou

Weekly Meeting 8.pdf
Weekly Meeting 8.pdfWeekly Meeting 8.pdf
Weekly Meeting 8.pdf
ZixunZhou
 
Weekly Meeting 7.pdf
Weekly Meeting 7.pdfWeekly Meeting 7.pdf
Weekly Meeting 7.pdf
ZixunZhou
 
Weekly Meeting 6.pdf
Weekly Meeting 6.pdfWeekly Meeting 6.pdf
Weekly Meeting 6.pdf
ZixunZhou
 
Weekly Meeting 4.pdf
Weekly Meeting 4.pdfWeekly Meeting 4.pdf
Weekly Meeting 4.pdf
ZixunZhou
 
Weekly Meeting 3.pdf
Weekly Meeting 3.pdfWeekly Meeting 3.pdf
Weekly Meeting 3.pdf
ZixunZhou
 
Weekly Meeting 2.pdf
Weekly Meeting 2.pdfWeekly Meeting 2.pdf
Weekly Meeting 2.pdf
ZixunZhou
 
Dashboard Design.pptx
Dashboard Design.pptxDashboard Design.pptx
Dashboard Design.pptx
ZixunZhou
 

More from ZixunZhou (7)

Weekly Meeting 8.pdf
Weekly Meeting 8.pdfWeekly Meeting 8.pdf
Weekly Meeting 8.pdf
 
Weekly Meeting 7.pdf
Weekly Meeting 7.pdfWeekly Meeting 7.pdf
Weekly Meeting 7.pdf
 
Weekly Meeting 6.pdf
Weekly Meeting 6.pdfWeekly Meeting 6.pdf
Weekly Meeting 6.pdf
 
Weekly Meeting 4.pdf
Weekly Meeting 4.pdfWeekly Meeting 4.pdf
Weekly Meeting 4.pdf
 
Weekly Meeting 3.pdf
Weekly Meeting 3.pdfWeekly Meeting 3.pdf
Weekly Meeting 3.pdf
 
Weekly Meeting 2.pdf
Weekly Meeting 2.pdfWeekly Meeting 2.pdf
Weekly Meeting 2.pdf
 
Dashboard Design.pptx
Dashboard Design.pptxDashboard Design.pptx
Dashboard Design.pptx
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Presentation1.pdf

  • 1. Slide 1 -- very brief introduction to your project. (Note, this is to help your classmates refresh their memory about your project, which should be very short, one or two sentences highlight) Jerry Slide 2 -- all research methods you have used to complete this project(for each method, use justone sentence to justify why it's necessary to adopt this method) Slide 3 -- all data you havecollected (a list of types of the data, including number of the email correspondence, number of the interviews, pages of documents you reviewed, etc.) {平台,数据仓库(github、kaggle)和canvasoet里提供的数据 (covid,ukrane,伊朗),调查平台数据是否可 用}(Ziyan),爬数据方法(jerry) 清理数据 (Jerry),数据可视化分析(SBS(lingyu),power bi()) Slide 4 ~ x -- final productdesign process (This should be the focus, tell us how your interaction with the sponsors, users, etc. informed your design thinking, and how you came up with the design ideas) 每周任务,不过也可以放我们的dashboard设计,如何沟通 (lingyu) Slide x+1 ~ y -- how your final productlooks like? (Note, a list of screen shots would be helpful, or a live demo but make sure your designed websiteworks properly. You only have about 20 minutes in total, so don't wastetime on searching, finding, or fixing websitepages) 介绍wireframe(Jerry) Slide y+1 -- Take away points (Whatyou havelearned from doing this project, shareyour valuable experiences)(ziyan) Final slide -- if you are given an opportunity to re-do the project, whatmay you change??? (Jerry)
  • 2. ITC 6040 Capstone Final Report Strategies for Identifying Mis-/Disinformation Team 2: Ouyang Zhaode, Lingyu Hu, Ziyan Yan, Zixun Zhou
  • 4. Mikhail Oet, PhD Professor in Commerce and Economic Development (CED) program Northeastern University Our Sponsors: Mission: To get the rightinformation to the right people at the right time Research • PlatformResearch • Data Repositories Research • Data Scraping Methods Research • U.S., China, Russian Research DataVisualization • Dashboard Design DataAnalysis • Data Cleaning • Sentiment Analysis • Word Semantic Analysis What Are We Doing? Help Identifying Fake News How We Identify?
  • 6. Research Method--- Qualitative Research 1. Identify Research Questions: • How to collect data? • How to use a data repository? • How to analyze a dataset? 2. Case Study 3. Research Report
  • 7. Research Method--- Quantitative Research Specific method: Data Analysis Analyze objective data---Statistical Data • Sentiment score • Information release time • Amount of information • Location
  • 9. Data Source (1) Gi t Hub ❑ Provider of hosting program and it could offer the research results of fake news ❑ The results were not used, but we use the dataset ❑ Use the keywords ❑ Datasetsourced from Weibo about the false information of COVID-19 K a ggl e ❖ Datasetwebsite owned by Google ❖ Offer scientific topics ❖ Provides data on the issue of fake news about COVID-19
  • 10. Data Source (2) 01 COVID-19 Source: Weibo, Twitter Topic: Misinformation of COVID-19 02 Ukraine Event Registry Ukraine-English Dataset Event Registry Ukraine-Russian Dataset From Twitter 03 Iran The theme of Iran will be from Tweet by inputting keywords
  • 11.
  • 12. Feasibility of Data 01 02 03 04 Highly Feasible Diverse Visualize Reliable
  • 13. Data We Collected Qualitative Quantitative Primary Secondary Information gather from the guest speakers & stakeholders(professors, sponsors, and other teams) Articles and reports we read Data we scrap on the social media and news websites Data our sponsor provided& data repositories we found Primary Secondary COUNT 3 Guest Sections 10+Zoom Recording 15+ Meetings 30+ Emails 30+ Articles, Reports & Videos 3 Experiment Web Scrapings 5+ Data Repositories We Found
  • 14. Data Scraping Methods Research RSS feed to CSV (Online Converting Tools) DataCollectors (Octopus, BrightData) Web Scraping (Python) Methods Use Cases Difficulty RSS feed to CSV Websites Providing RSS Feeds Low Data Collectors PopularSocial Medias Medium Web Scraping Static Websites High
  • 15. Part 4 Data Cleaning & Analysis
  • 16. Data Cleaning MS Excel Power Query • For CSV format • Easy-to-use • For ad-hocanalysis (One-time use) * Limitations: • Data should less than 1 million row • Data should less than 1GB Python • For JSON or other data format • Can cut a large data into many smaller files • Cleaning as scale • For data pipeline use (continuously data streaming) * No limitation, but take more time and more effort We can use Google Sheet to do batch translation
  • 17. Power BI 1. Data visualization 2. Data query 3. Data Modeling 4. Key data analysis
  • 20. Ukraine – Russian SBS Analysis • All words content are Russian • All records are news • “Russian” and “Ukraine” appeared mostin the dataset • Specific words do not appear too much
  • 21. Ukraine – Russian SBS Analysis • Ukraine, Russian, and Putin care Topic 1 most • NATO, Russian, and USA care Topic 2 most • Xi and Biden care about Topic 5 most
  • 22. Ukraine – Russian SBS Analysis • T6 has a strong relationship with T2 • T5 has the second strong relationship with T2 Conclusions: • Ukraine, Russian and Putin care Winter Olympicsmost • Russian, NATO and USA care potentialmilitaryactivities • Xi and Biden care relationshipwith other countries • Covid has stronger relationshipwith potentialmilitary activities • Relationship incountries could influence the potential militaryactivities
  • 23. Twitter Transparency Project Power BI Analysis • [Hanya Kamu]:Only you • Most hashtags are meaningless • Tweet numbers in 2012 • Most tweets appeared in June and August • Trend is unstable
  • 25. User Input External Data - Revenpack News Articles Dashboard Fake Score Sentiment polyfact propublica Local Check Based on Historical Data External Data - GDI Contribution by Country
  • 26. Dashboard - Data Prerequisite News Article Dataset External Data
  • 28. Take Away (1) Data visualization
  • 29. Take Away (2) 01 02 03 Communication Diversity Identifying Theme
  • 31. Data Collection Plan 1. Develop a Data Collection Plan 2. Us 1. What are we going to solve? e.g., A list of issues 2. What consider success? e.g., A Service Level Agreements (SLA) 3. What dataavailable? 4. What form does that data come in? 5. Where the datawill be collectedfrom? 6. Whether to measure a sample or the whole population? 7. What format the datawill be displayed? We Did We Missed
  • 35. Week 1- 2 : Platform Exploration Exploring platforms where the data can be crawled. Platforms in Russian, English and Chinese. If possible, crawl the data by learning new tools.
  • 36. Week 3 – 4: Learning New Tools Clean and filter the data. 1 Learn new tools: SBS (and its format), Power BI, sentiment analysis 2 Provide some basic findings. 3
  • 37. Week 5 – 6: Data Visualization • Use data visualization tools to analyze data • Determine the final tools: SBS, Power BI. • Providing some findings.
  • 38. Week 7 – 8: Dashboard learn and design • Keep using SBS to analyze the data • Design dashboard by learning from Ravenpack • Combine and provide sample designed dashboard Sample designed dashboard
  • 39. Process to Final Product 1. Ask Questions 2. Create 3. Feedback 4. Research 5. Revise 6. Feedback 7. Continue Revise
  • 40. TwitterTransparencyDatasets WebScrapping Data Sources HistoricalData WeiboDatasets EventRegistry Datasets Live Data TwitterAPI Process Storage Consume Data Analysis & Query Machine Learning Cloud Architecture
  • 42. TwitterTransparencyDatasets WebScrapping Data Sources HistoricalData WeiboDatasets EventRegistry Datasets Live Data TwitterAPI Process Storage Consume Data Analysis & Query Machine Learning Cloud Architecture
  • 43. Process Storage Data Analysis & Query Machine Learning A Database of Metadata
  • 44. Machine Learning • Existing fake news prediction model • Can train non-relational data (video, image, audio) • Low-Code
  • 45. TwitterTransparencyDatasets Cleaning Machine LearningFake Score Sentiment Data Visualization Clean & Transform WebScrapping Data Sources HistoricalData WeiboDatasets EventRegistry Datasets Live Data Process Consume Transfer TwitterAPI