2. The Chicago Transit Authority (CTA)
operates the ‘L’ (elevated)
• Overwhelming amount of
data exists for describing
the system
• CTA Twitter account is
still operated by a person
in a control room
• Could we do better?
3. ATwitter bot sends out timely information
Pulls data from
sources
Analyzes data,
finds what’s
important
Creates sentence
and posts it to
Twitter
11. Wealth of structured data exists for the ‘L’
amongst other things in Chicago
Okay! Okay! Most of this is irrelevant! How do I
quickly find out what actually matters?
12. What kinds of events impact train travel and are
worth mentioning? Chicago Cubs’ games?
Daily ridership for Addison Stop
(Red), right where the Chicago
Cubs play
13. Random forest modeling ridership
showed baseball mattered, bot tweets it
Trained on 2011-2013, tested on 2014-2015
Here used day of the week
and day of the year as
features
14. Random forest modeling ridership
showed baseball mattered, bot tweets it
Trained on 2011-2013, tested on 2014-2015
Here used day of the week,
day of the year, and if there
was a Cubs game that day
as features
15. ‘L’ Tron works 24/7 on an EC2 instance
Find what the
person wants
Compare data
to timetable and
look for delays
Search for other events
(e.g. baseball), compare
to ridership model
Thread 1: Every 5 minutes
Query CTA
server for data
via API
Thread 2: Someone talks to ‘L’ Tron
Look for data
from Thread 1 to
respond with
16. What should I tweet to my audience?
Find a line delay > 5 minutes?
No
Is there a baseball game?
Yes
No
Does the system look
okay?
No
Yes
Yes
Tweet it out!
Tweet it out!
Tweet it out!
Tweet it out!
Following from Thread 1:
17. Language generation starts with a large,
human-written corpus
"[route_name] line trains on their way toward [destination] are running roughly [delay_minutes] [minute_s]
late.”
"[route_name] line trains on their way toward [destination] have fallen roughly [delay_minutes] [minute_s]
behind schedule.”
"[route_name] line trains on their way to [destination] are running roughly [delay_minutes] [minute_s]
late.”
"[destination] headed [route_name] line trains have fallen roughly [delay_minutes] [minute_s] behind
schedule.”
"[destination] bound [route_name] line trains are running roughly [delay_minutes] [minute_s] behind
schedule.”
"[destination] bound [route_name] line trains have fallen roughly [delay_minutes] [minute_s] behind
schedule.”
Each tweet template is categorized for a particular use case
18. A template is chosen at random and filled
in as needed
"[destination] bound [route name] line
trains are running about
[delay_minutes] [minute_s] behind
schedule."
”O’Hare bound Blue line trains are
running about 12 minutes behind
schedule."
If a delay of 12 minutes is found on the O’Hare bound Blue
line, those details are inserted into the template