AIDR Tutorial (Artificial Intelligence for Disaster Response)
Qatar Compu1ng Research Ins1tute, HBKU
• Data collec2on in AIDR
• Data classiﬁca2on in AIDR
• Data view/download in AIDR
Data Collec2on in AIDR
• Twi:er data collec2on strategies that AIDR supports
– By keywords
– By geographical regions
• Strict: coordinates strictly inside geo boundaries
• Approximate: tweets from a place that overlaps with the geo
– By following Twi:er users
– By keywords + regions
• Tweets that match any of the keywords and within the geo
Data Collec2on Using Keywords
• Keywords limit = 400
• One keyword could a single word like
“Suﬀolk” or a phrase “Suﬀolk accident”
• 1 keyword/phrase cannot be more than 60
bytes (1 char = 1 byte)
• Generic keywords collect irrelevant tweets
• Speciﬁc keywords most likely collect relevant
• Bounding boxes do not act as ﬁlters for other ﬁlter
parameters. For example :
would match any tweets containing the term Twi:er (even
non-geo tweets) OR coming from the San Francisco area.
Following Twi:er Users
For each user speciﬁed, the tool will collect:
• Tweets created by the user.
• Tweets which are retweeted by the user.
• Replies to any Tweet created by the user.
• Retweets of any Tweet created by the user.
• Manual replies, created without pressing a reply bu:on (e.g.
“@twi:erapi I agree”).
The tool will not contain:
• Tweets men2oning the user (e.g. “Hello @twi:erapi!”).
• Manual Retweets created without pressing a Retweet bu:on (e.g.
“RT @twi:erapi The API is great”).
• Tweets by protected users.
Use comma-separated list of TwiFer user id (hFp://geFwiFerid.com/)
Data Classiﬁca2on in AIDR
• Deﬁne classiﬁers (name, descrip2on)
– Deﬁne labels (name, descrip2on)
– Having a “miscellaneous” category will be helpful
• Wait around 15-20 minutes (for fast
collec2ons) and 30-40 minutes (for slow
• Start tagging
• Check the classiﬁer status (UI)
– First classiﬁer/model will be up ager 50 labeled
tweets, ideally equally distributed among labels
– If no model appears ager 50 tags, keep tagging
• Human-tagged items (the more the be:er)
• 40 more needed to re-train (next classiﬁer target)
• Machine-tagged items (keep an eye on
• Quality (ideally should be 90 < AUC != 100)