EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
Presentation1.pdf
1. Slide 1 -- very brief introduction to your project. (Note, this is to help your classmates refresh their memory about your project, which should be very
short, one or two sentences highlight) Jerry
Slide 2 -- all research methods you have used to complete this project(for each method, use justone sentence to justify why it's necessary to adopt this
method)
Slide 3 -- all data you havecollected (a list of types of the data, including number of the email correspondence, number of the interviews, pages of
documents you reviewed, etc.) {平台,数据仓库(github、kaggle)和canvasoet里提供的数据 (covid,ukrane,伊朗),调查平台数据是否可
用}(Ziyan),爬数据方法(jerry)
清理数据 (Jerry),数据可视化分析(SBS(lingyu),power bi())
Slide 4 ~ x -- final productdesign process (This should be the focus, tell us how your interaction with the sponsors, users, etc. informed your design
thinking, and how you came up with the design ideas) 每周任务,不过也可以放我们的dashboard设计,如何沟通 (lingyu)
Slide x+1 ~ y -- how your final productlooks like? (Note, a list of screen shots would be helpful, or a live demo but make sure your designed websiteworks
properly. You only have about 20 minutes in total, so don't wastetime on searching, finding, or fixing websitepages) 介绍wireframe(Jerry)
Slide y+1 -- Take away points (Whatyou havelearned from doing this project, shareyour valuable experiences)(ziyan)
Final slide -- if you are given an opportunity to re-do the project, whatmay you change??? (Jerry)
2. ITC 6040 Capstone
Final Report
Strategies for Identifying
Mis-/Disinformation
Team 2:
Ouyang Zhaode, Lingyu Hu, Ziyan Yan, Zixun Zhou
4. Mikhail Oet, PhD
Professor in Commerce and Economic
Development (CED) program
Northeastern University
Our Sponsors:
Mission:
To get the rightinformation to the
right people at the right time
Research
• PlatformResearch
• Data Repositories Research
• Data Scraping Methods Research
• U.S., China, Russian Research
DataVisualization
• Dashboard Design
DataAnalysis
• Data Cleaning
• Sentiment Analysis
• Word Semantic Analysis
What Are We Doing? Help Identifying Fake News
How We Identify?
6. Research Method---
Qualitative Research
1. Identify Research Questions:
• How to collect data?
• How to use a data repository?
• How to analyze a dataset?
2. Case Study
3. Research Report
9. Data Source (1)
Gi t Hub
❑ Provider of hosting program and it could offer the research results of fake news
❑ The results were not used, but we use the dataset
❑ Use the keywords
❑ Datasetsourced from Weibo about the false information of COVID-19
K a ggl e
❖ Datasetwebsite owned by Google
❖ Offer scientific topics
❖ Provides data on the issue of fake news about COVID-19
10. Data Source (2)
01
COVID-19
Source: Weibo, Twitter
Topic: Misinformation of COVID-19
02 Ukraine
Event Registry Ukraine-English Dataset
Event Registry Ukraine-Russian Dataset
From Twitter
03 Iran
The theme of Iran will be from Tweet by inputting keywords
13. Data We Collected
Qualitative
Quantitative
Primary
Secondary
Information gather from the guest speakers &
stakeholders(professors, sponsors, and other teams)
Articles and reports we read
Data we scrap on the social media and news websites
Data our sponsor provided&
data repositories we found
Primary
Secondary
COUNT
3 Guest Sections
10+Zoom Recording
15+ Meetings
30+ Emails
30+ Articles,
Reports & Videos
3 Experiment
Web Scrapings
5+ Data
Repositories We
Found
14. Data Scraping Methods Research
RSS feed to CSV (Online Converting Tools) DataCollectors (Octopus, BrightData)
Web Scraping (Python)
Methods Use Cases Difficulty
RSS feed to CSV Websites Providing RSS Feeds Low
Data Collectors PopularSocial Medias Medium
Web Scraping Static Websites High
16. Data Cleaning
MS Excel Power Query
• For CSV format
• Easy-to-use
• For ad-hocanalysis (One-time use)
* Limitations:
• Data should less than 1 million row
• Data should less than 1GB
Python
• For JSON or other data format
• Can cut a large data into many smaller files
• Cleaning as scale
• For data pipeline use (continuously data streaming)
* No limitation, but take more time and more effort
We can use
Google Sheet
to do batch
translation
17. Power BI
1. Data visualization
2. Data query
3. Data Modeling
4. Key data analysis
20. Ukraine – Russian
SBS Analysis
• All words content are Russian
• All records are news
• “Russian” and “Ukraine” appeared mostin
the dataset
• Specific words do not appear too much
21. Ukraine – Russian SBS Analysis
• Ukraine, Russian, and Putin care Topic 1 most
• NATO, Russian, and USA care Topic 2 most
• Xi and Biden care about Topic 5 most
22. Ukraine – Russian SBS Analysis
• T6 has a strong relationship with T2
• T5 has the second strong relationship
with T2
Conclusions:
• Ukraine, Russian and Putin care Winter Olympicsmost
• Russian, NATO and USA care potentialmilitaryactivities
• Xi and Biden care relationshipwith other countries
• Covid has stronger relationshipwith potentialmilitary activities
• Relationship incountries could influence the potential militaryactivities
23. Twitter Transparency
Project
Power BI Analysis
• [Hanya Kamu]:Only you
• Most hashtags are meaningless
• Tweet numbers in 2012
• Most tweets appeared in June and August
• Trend is unstable
25. User Input
External Data - Revenpack
News Articles
Dashboard
Fake Score
Sentiment
polyfact
propublica
Local Check
Based on
Historical Data
External Data - GDI
Contribution by Country
26. Dashboard - Data Prerequisite
News Article Dataset
External Data
31. Data Collection
Plan
1. Develop a Data Collection Plan
2. Us
1. What are we going to solve?
e.g., A list of issues
2. What consider success?
e.g., A Service Level Agreements (SLA)
3. What dataavailable?
4. What form does that data come in?
5. Where the datawill be collectedfrom?
6. Whether to measure a sample or the whole population?
7. What format the datawill be displayed?
We Did
We Missed
35. Week 1- 2 : Platform Exploration
Exploring platforms where the data can be crawled.
Platforms in Russian, English and Chinese.
If possible, crawl the data by learning new tools.
36. Week 3 – 4: Learning New Tools
Clean and filter the data.
1
Learn new tools: SBS (and its format),
Power BI, sentiment analysis
2
Provide some basic findings.
3
37. Week 5 – 6: Data Visualization
• Use data visualization tools to analyze
data
• Determine the final tools: SBS, Power
BI.
• Providing some findings.
38. Week 7 – 8: Dashboard learn and design
• Keep using SBS to analyze the data
• Design dashboard by learning from
Ravenpack
• Combine and provide sample
designed dashboard
Sample designed dashboard
39. Process to Final Product
1. Ask Questions
2. Create
3. Feedback
4. Research
5. Revise
6. Feedback
7. Continue Revise