Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
AIDR	Tutorial	
Muhammad	Imran	
Research	Scien1st	
Qatar	Compu1ng	Research	Ins1tute,	HBKU	
Doha,	Qatar	
h"p://aidr.qcri.org...
Outline	
•  Data	collec2on	in	AIDR	
•  Data	classifica2on	in	AIDR	
•  Data	view/download	in	AIDR
Data	Collec2on	in	AIDR	
•  Twi:er	data	collec2on	strategies	that	AIDR	supports	
–  By	keywords	
–  By	geographical	regions...
Data	Collec2on	Using	Keywords	
•  Keywords	limit	=	400	
•  One	keyword	could	a	single	word	like	
“Suffolk”	or	a	phrase	“Suff...
Keywords	Examples
Loca2on-based	Collec2on	
•  Bounding	boxes	do	not	act	as	filters	for	other	filter	
parameters.	For	example	:	
keyword=twi:er...
Following	Twi:er	Users	
For	each	user	specified,	the	tool	will	collect:	
•  Tweets	created	by	the	user.	
•  Tweets	which	ar...
Classifier	UI
Detailed	Informa2on	of	Classifiers
Data	Classifica2on	in	AIDR	
•  Define	classifiers	(name,	descrip2on)	
– Define	labels	(name,	descrip2on)	
– Having	a	“miscella...
Classifier	Genera2on	
•  Check	the	classifier	status	(UI)	
–  First	classifier/model	will	be	up	ager	50	labeled	
tweets,	idea...
AIDR Tutorial (Artificial Intelligence for Disaster Response)
Upcoming SlideShare
Loading in …5
×

AIDR Tutorial (Artificial Intelligence for Disaster Response)

338 views

Published on

This is a short tutorial of AIDR.

Published in: Technology
  • Be the first to comment

AIDR Tutorial (Artificial Intelligence for Disaster Response)

  1. 1. AIDR Tutorial Muhammad Imran Research Scien1st Qatar Compu1ng Research Ins1tute, HBKU Doha, Qatar h"p://aidr.qcri.org/
  2. 2. Outline •  Data collec2on in AIDR •  Data classifica2on in AIDR •  Data view/download in AIDR
  3. 3. Data Collec2on in AIDR •  Twi:er data collec2on strategies that AIDR supports –  By keywords –  By geographical regions •  Strict: coordinates strictly inside geo boundaries •  Approximate: tweets from a place that overlaps with the geo boundaries. –  By following Twi:er users –  By keywords + regions •  Tweets that match any of the keywords and within the geo boundaries.
  4. 4. Data Collec2on Using Keywords •  Keywords limit = 400 •  One keyword could a single word like “Suffolk” or a phrase “Suffolk accident” •  1 keyword/phrase cannot be more than 60 bytes (1 char = 1 byte) •  Generic keywords collect irrelevant tweets •  Specific keywords most likely collect relevant tweets
  5. 5. Keywords Examples
  6. 6. Loca2on-based Collec2on •  Bounding boxes do not act as filters for other filter parameters. For example : keyword=twi:er&loca2ons=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twi:er (even non-geo tweets) OR coming from the San Francisco area.
  7. 7. Following Twi:er Users For each user specified, the tool will collect: •  Tweets created by the user. •  Tweets which are retweeted by the user. •  Replies to any Tweet created by the user. •  Retweets of any Tweet created by the user. •  Manual replies, created without pressing a reply bu:on (e.g. “@twi:erapi I agree”). The tool will not contain: •  Tweets men2oning the user (e.g. “Hello @twi:erapi!”). •  Manual Retweets created without pressing a Retweet bu:on (e.g. “RT @twi:erapi The API is great”). •  Tweets by protected users. Use comma-separated list of TwiFer user id (hFp://geFwiFerid.com/)
  8. 8. Classifier UI
  9. 9. Detailed Informa2on of Classifiers
  10. 10. Data Classifica2on in AIDR •  Define classifiers (name, descrip2on) – Define labels (name, descrip2on) – Having a “miscellaneous” category will be helpful •  Wait around 15-20 minutes (for fast collec2ons) and 30-40 minutes (for slow collec2on) •  Start tagging
  11. 11. Classifier Genera2on •  Check the classifier status (UI) –  First classifier/model will be up ager 50 labeled tweets, ideally equally distributed among labels –  If no model appears ager 50 tags, keep tagging •  Human-tagged items (the more the be:er) •  40 more needed to re-train (next classifier target) •  Machine-tagged items (keep an eye on misclassifica2ons) •  Quality (ideally should be 90 < AUC != 100)

×