Data Science Workshop - day 1

‫منصات‬ ‫على‬ ‫االجتماعية‬ ‫اآلراء‬ ‫تحليل‬
‫االجتماعي‬ ‫التواصل‬:
‫اجتماعية‬ ‫كحالة‬ ‫للسيارة‬ ‫المرأة‬ ‫قيادة‬
‫الداود‬ ‫أسيل‬
‫االجتماعية‬ ‫والعلوم‬ ‫البيانات‬ ‫علم‬ ‫مجال‬ ‫في‬ ‫دكتوراه‬ ‫طالبة‬
ABD

About Me
• Fifth year PhD candidate in informatics at the University of
Illinois at Urbana-Champaign.
• Master of Science in Information Management, University of
Illinois at Urbana Champaign.
• Master of Engineering in Computer Science. Cornell University.
• Bachelors of Science, Information Technology. King Saud
University.
@aseel_addawood https://sites.google.com/view/aseeladdawood/

My Research Interest
Understanding the discussions of
controversial issues in social media

• Field of study?
• Have you done DS
before?
• Programming
experience, which
language?

1. Brief intro to data science
2. Skills needed to become a
data scientist
3. Environment Setup
4. 10 min break
5. Data science cycle:
a. Data collection
b. 10 min break
c. Data annotation
6. 30 min is for questions

Why? 3 reasons…
• The value of data does not come from
its volume, its from it’s connections and
insights you can generate from it.
• Data cannot be depleted, in fact the
amount of data seems to be exploding.
• Data is infinitely durable and usable.
https://cdn-images-1.medium.com/max/1200/1*KFHLIacf2U44bDcQGbMaBw.jpeg

http://effectualsystems.com/data-need-information/

http://effectualsystems.com/data-need-information/
Data Science

DIFFERENTTYPES OF DATA SOURCES

New type of data
Cyborg =
organic
+
biomechatronic body
parts

What CanWe DoWith Data?
• Recommender systems
• Image Recognition
• Digital Advertisements
• Speech Recognition
• Gaming
• Price Comparison Websites
• Airline Route Planning
• Fraud and Risk Detection
• Delivery logistics
• Etc…

Computer
science
Math and
statistics
Domain
knowledge
Data
science
Components of Data Science

How to become a data scientist ?

T-shaped Skill Set
https://www.slideshare.net/ryanorban/how-to-become-a-data-scientist

You do not need a PhD to do data science

The best way to learn data science is by doing
data science

Python + Jupyter notebook
http://bit.ly/2Dj35WO

First…
1. Create a folder in your desktop.
2. Name it DSTutorial.
3. Download the code files and save it in the new
directory.

Second…
1. Open terminal or CMD.
2. Go to the folder you created. cd
/desktop/DSTutorial
3. Open the notebook. jupyter notebook

You should have your notebook open and READY!

Identify
Problem
Query
Data
Source
Store the
data
Data Collection Data Annotation
Identify
class
Feature
extraction
Preprocess
Missing
data
Data Cleaning
Descriptive
statistics
Data Exploration
Plotting
Word
analysis
Model
training
ML Classification Models
Classificatio
n models
Accuracy
assessment
Visualization
Result Communication
Application/product
Report finding
Annotate

Step 1: Identify the problem /
research questions
What are you interested in
understanding that can
with expanding the
knowledge.
What previous work
done that you can
Two ways:
1.Start with a question in
(‫)البطالة‬
2.Start with the data (‫)ساهر‬

To make this more realistic, lets take an example…

‫عليه‬ ‫وقع‬ ‫اجتماعي‬ ‫حدث‬
‫الجدل‬ ‫من‬ ‫الكثير‬
‫السعودية‬ ‫في‬ ‫للسيارة‬ ‫المرأة‬ ‫قيادة‬
‫واأليدي‬ ‫التاريخي‬ ‫الصراع‬ ‫المرأة‬ ‫قيادة‬ ‫قضية‬ ‫تبرز‬‫ولوجي‬
‫المحاف‬ ‫األصوات‬ ‫بين‬ ‫السعودية‬ ‫العربية‬ ‫المملكة‬ ‫في‬‫ظة‬
‫ليبرالية‬ ‫واألكثر‬
https://www.albayan.ae/five-senses/east-and-west/2018-05-29-1.3278235

Step 2: Data Collection
Build the query
For Twitter, you need to identify the
keywords and the time range.
The choice of keywords matters:
bootstrapping etc.
Source of Twitter data
collection
Paid firehose access: Crimson
hexagon
Free access: Twitter API
Storing the data
Excel files as csv
Json file
Databases, SQL

WhyTwitter?
Vast amount of data with
easy access.
Saudi Arabia is among the
countries with the highest
number ofTwitter users
among its online population.
Saudi Arabia is producing
40% of all tweets in the Arab
world.
1. Countries with most Twitter users 2018 | Statistic, Retrived from: https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selectedcountries.
2. Saudi Arabia: number of internet users 2022 | Statistic, Retrived from: https://www.statista.com/statistics/462959/internet-users-saudi-arabia
3. Salem, F., Mourtada, R.: Citizen engagement and public services in the Arab world: The potential of social media. the Governance and Innovation Program at the
Mohammed Bin Rashid School of Government, Dubai (2014).

BuildThe Query
"((‫سواقة‬"‫أ‬‫و‬"‫قيادة‬"‫أ‬‫و‬"‫قياده‬"‫أ‬‫و‬"‫سواقه‬)"‫و‬("‫مراءة‬"‫أ‬‫و‬"‫مراءه‬"‫أ‬‫و‬"‫المراءة‬"‫أ‬‫و‬"‫المراءه‬"‫أ‬‫و‬
"‫المرأة‬"‫أ‬‫و‬"‫المراه‬"‫أ‬‫و‬"‫النساء‬"‫أ‬‫و‬"‫حريم‬"‫أ‬‫و‬"‫حرمه‬)) "
‫أ‬‫و‬
"((‫سواقة‬"‫أ‬‫و‬"‫قيادة‬"‫أ‬‫و‬"‫قياده‬"‫أ‬‫و‬"‫سواقه‬)"‫و‬"(‫مراءة‬"‫أ‬‫و‬"‫مراءه‬"‫أ‬‫و‬"‫المراءة‬"‫أ‬‫و‬"‫المراءه‬"‫أ‬‫و‬"‫المرأة‬"
‫أ‬‫و‬"‫المراه‬"‫أ‬‫و‬"‫النساء‬"‫أ‬‫و‬"‫حريم‬"‫أ‬‫و‬"‫حرمه‬)"
‫و‬
(("‫الغاء‬"‫أ‬‫و‬"‫تقودي‬ ‫لن‬"‫أ‬‫و‬"‫رفض‬"‫أ‬‫و‬"‫ضد‬"‫أ‬‫و‬"‫مع‬)"‫أ‬‫و‬"(‫سيارة‬"‫أ‬‫و‬"‫سياره‬"‫أ‬‫و‬"‫رخصة‬"‫أ‬‫و‬
"‫رخص‬"‫أ‬‫و‬"‫مدرسة‬"‫أ‬‫و‬"‫مدارس‬"‫أ‬‫و‬"‫تعليم‬"‫أ‬‫و‬"‫مدرسه‬)))"

Time Frame
1st - 30th, September 2017;
The month during which the
announced the permission for women
Total number of tweets
collected
10,247 tweets

Lets open the excel sheet..
https://bit.ly/2EZBnQd

Step 3: Data Annotation
Identify classes (this
corresponds to your research
question)
Binary ( positive, negative | for,
| gender etc.)
Multi-class (types of evidence,
users etc.)
Annotate
Human (build the codebook, train
inter-annotator agreement - Cohen’s
etc.)
Automatic
Feature
extraction
Linguistics (LIWC, MPQA)
Syntactic (POS tags)
Twitter related (# followers,
#retweets)

Identify Classes
Label Instructions Example
Neutral
•‫القيادة‬ ‫موضوع‬ ‫عن‬ ‫اخبار‬
•‫القيادة‬ ‫موضوع‬ ‫عن‬ ‫اسئله‬
•‫للرأي‬ ‫واضح‬ ‫غير‬ ‫تعبير‬ ‫أي‬
•‫القيادة‬ ‫بموضوع‬ ‫استهزاء‬
•‫الخ‬ ‫الرخصة‬ ‫مثال‬ ‫القيادة‬ ‫غير‬ ‫آخر‬ ‫لشيء‬ ‫المعارضة‬ ‫كانت‬ ‫إذا‬
•‫بالقيادة‬ ‫متعلقة‬ ‫غير‬ ‫أخرى‬ ‫بمواضيع‬ ‫االنخراط‬
•‫التويته‬ ‫بنفس‬ ‫متعارضة‬ ‫آراء‬
‫في‬ ‫المرأة‬ ‫قيادة‬ ‫مظاهرة‬#‫عام‬ ‫الدولة‬ ‫ضد‬ ‫السعودية‬1990
Women2Drive# http://t.co/PyzAO0mUpV
For ‫واضح‬ ‫تعبير‬ ‫اي‬‫المرأة‬ ‫قيادة‬ ‫مع‬ ‫بأنه‬ ‫للرأي‬
@Qahtani098 ،‫اخواتي‬ ‫امنع‬ ‫ماراح‬ ‫القرار‬ ‫تفعيل‬ ‫تم‬ ‫اذا‬
‫غير‬ ‫سواق‬ ‫مع‬ ‫تركب‬ ‫يمنعها‬ ‫انه‬ ‫االحق‬ ،‫المرأة‬ ‫قيادة‬ ‫يمنع‬ ‫دليل‬
‫محرم‬
Against ‫واضح‬ ‫تعبير‬ ‫اي‬‫المرأة‬ ‫قيادة‬ ‫برفض‬
@Saamaa2 ‫الحقيقية‬ ‫المرأة‬ ‫مشاكل‬ ‫طرح‬ ‫انتظرنا‬ ‫وقت‬ ‫في‬
‫السيارة‬ ‫قيادة‬ ‫عن‬ ‫رفعوها‬ ‫الشورى‬ ‫نساء‬ ‫ان‬ ‫اسمع‬ ‫توصية‬!‫وسبحان‬
‫مع‬ ‫متزامنة‬ ‫صدف‬ ‫هللا‬#‫قيادة‬_26‫اكتوبر‬ 😏

Annotation Instruction
1. Open google sheet.
2. Read the tweet.
3. Based on the table, label the tweet as either for,
against or neutral.
4. Add your name to each tweet you label.
5. Add some notes if needed.
6. If you did not know how to label the tweet skip it to
the next tweet.
Each person should annotate 10 tweets

Social Media Data Challenges
• Online users’ expressions are written informally, so may include sarcasm,
spelling mistakes, unconventional grammar, slang words and expressions.
• The differences in opinion between the annotators.
• You need someone from the same culture.
• It might not be representative of the whole population, but it qualifies as a
representative sample.

Upload the data file to your folder…

Data Science Workshop - day 1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Science Workshop - day 1

Similar to Data Science Workshop - day 1 (20)

More from Aseel Addawood

More from Aseel Addawood (9)

Recently uploaded

Recently uploaded (20)

Data Science Workshop - day 1

Editor's Notes