2. Motivation
• Twitter
represents
a
rich
flow
of
information
• Lack
of
an
effective
way
to
query
the
twitter
• Hard
to
monitor
interested
topics
at
real
time
3. Search
Tweets
Like
a
Professional
A
Real
Time
Twitter
Search
Engine
That
Allows
you
to
Search
based
on:
•Keywords
◦Country
◦Language
◦Negative
words
Demo(http://searchyourtweet.info:5000/input)
4. Keep
an
eye
on
your
interested
topic
•Express
your
interest,
we
will
keep
you
update
on
the
newest
event
•Video
(https://youtu.be/GdRmXNfukos)
6. Real
Time
Monitor
on
Twitter
◦Implemented
using
ElasticSearch Percolator
◦Think
it
as
“search
in
reverse”
◦ User
register
queries
into
percolator
◦ Percolator
match
incoming
documents
with
registered
queries
◦Challenge:
◦ How
to
design
the
percolator
data
pipeline?
◦ How
to
decouple
the
backend
database
with
frontend
server?
◦ Use
publish
/
subscribe
design
pattern
7. Real
Time
Monitor
Data
Flow
Percolator
Query
database
Twitter
database
Controller
Pub/Sub
subscribe
Open
channel
8. Challenge
Build
a
high
throughput
real
time
backend
data
pipeline?
• Use
Logstash!
◦ Highly Scalable
◦ Compatiblewith
different
sources
and
destination
A
scalable
high
throughput
pipelineCurrent
backend
pipeline
9. Challenge
• Real
time
update
on
frontend
client:
• Instead
of
using
“setInterval()”
javascript function,
I
use
“socketIO”
to
keep
socket
open
between
front-‐end
client
and
flask
server
• Construct
ElasticSearch query
• Use
python
requests
library
to
query
ElasticSearch
• Fine
tuning
on
ElasticSearch
10. About
Me
M.Math,
University
of
Waterloo
◦ Field:
Statistics
and
Machine
Learning
B.S.,
University
of
Toronto
◦ Field:
Applied
Mathematics
Data
Scientist
Intern,
Neon
Inc.,
San
Francisco
Back-‐end
Model
Developer,
MetricAid Inc.,
Toronto
Experience
in
Deep
Learning:
◦ Convolutional
Network,
Recurrent
Network
•OS/161
(a
simplified
POSIX
OS)