SlideShare a Scribd company logo
1 of 12
Download to read offline
SEARCH	
  YOUR	
  TWEETS
SEARCH	
  LIKE	
  A	
  PROFESSIONAL
Motivation
• Twitter	
  represents	
  a	
  rich	
  flow	
  of	
  information
• Lack	
  of	
  an	
  effective	
  way	
  to	
  query	
  the	
  twitter
• Hard	
  to	
  monitor	
  interested	
  topics	
  at	
  real	
  time
Search	
  Tweets	
  Like	
  a	
  Professional
A	
  Real	
  Time	
  Twitter	
  Search	
  Engine	
  That	
  
Allows	
  you	
  to	
  Search	
  based	
  on:
•Keywords
◦Country
◦Language
◦Negative	
  words
Demo(http://searchyourtweet.info:5000/input)
Keep	
  an	
  eye	
  on	
  your	
  interested	
  topic
•Express	
  your	
  interest,	
  we	
  will	
  keep	
  you	
  update	
  on	
  the	
  newest	
  event
•Video	
  (https://youtu.be/GdRmXNfukos)
Data	
  pipeline
Query	
  Controller
Backend	
  Database
percolator
Logic	
  Layer Frontend
Searching	
  database
Data	
  Backup
Pub/Sub
Publish
Matching	
  query
Register	
  query
searching
Real	
  Time	
  Monitor	
  on	
  Twitter
◦Implemented	
  using	
  ElasticSearch Percolator
◦Think	
  it	
  as	
  “search	
  in	
  reverse”
◦ User	
  register	
  queries	
  into	
  percolator
◦ Percolator	
  match	
  incoming	
  documents	
  with	
  registered	
  queries
◦Challenge:
◦ How	
  to	
  design	
  the	
  percolator	
  data	
  pipeline?
◦ How	
  to	
  decouple	
  the	
  backend	
  database	
  with	
  frontend	
  server?
◦ Use	
  publish	
  /	
  subscribe	
  design	
  pattern
Real	
  Time	
  Monitor	
  Data	
  Flow
Percolator
Query	
  database
Twitter	
  database
Controller
Pub/Sub
subscribe
Open	
  channel
Challenge
Build	
  a	
  high	
  throughput	
  real	
  time	
  
backend	
  data	
  pipeline?
• Use	
  Logstash!
◦ Highly Scalable
◦ Compatiblewith	
  different	
  sources	
  and	
  
destination
A	
  scalable	
  high	
  throughput	
   pipelineCurrent	
  backend	
  pipeline
Challenge
• Real	
  time	
  update	
  on	
  frontend	
  client:
• Instead	
  of	
  using	
  “setInterval()”	
  javascript function,	
  I	
  use	
  “socketIO”	
  to	
  keep	
  
socket	
  open	
  between	
  front-­‐end	
  client	
  and	
  flask	
  server	
  
• Construct	
  ElasticSearch query
• Use	
  python	
  requests	
  library	
  to	
  query	
  ElasticSearch
• Fine	
  tuning	
  on	
  ElasticSearch
About	
  Me
M.Math,	
  University	
  of	
  Waterloo
◦ Field:	
  Statistics	
  and	
  Machine	
  Learning
B.S.,	
  University	
  of	
  Toronto
◦ Field:	
  Applied	
  Mathematics
Data	
  Scientist	
  Intern,	
  Neon	
  Inc.,	
  San	
  Francisco
Back-­‐end	
  Model	
  Developer,	
  MetricAid Inc.,	
  Toronto
Experience	
  in	
  Deep	
  Learning:	
  
◦ Convolutional	
  Network,	
  Recurrent	
  Network
•OS/161	
  (a	
  simplified	
  POSIX	
  OS)
Questions?
Thank	
  you!	
  
Parallelization	
  of	
  percolator
• Will	
  consumes	
  a	
  lot	
  
hardware:	
  O(mn)
• Another	
  choice:
Luwak +	
  Samza

More Related Content

Viewers also liked

914 Foundation 2009 Portables
914 Foundation 2009 Portables914 Foundation 2009 Portables
914 Foundation 2009 PortablesRafael Lebron
 
I N T E R N A T I O N A L I S M Dr
I N T E R N A T I O N A L I S M  DrI N T E R N A T I O N A L I S M  Dr
I N T E R N A T I O N A L I S M Drghanyog
 
S Monica Jardins Cond Club E Mail
S Monica Jardins Cond Club   E MailS Monica Jardins Cond Club   E Mail
S Monica Jardins Cond Club E Mailimoveisdorio
 
web 2.0 seconda parte
web 2.0 seconda parteweb 2.0 seconda parte
web 2.0 seconda parteAngelo Panini
 
La Vista CalçAda Barra Bonita E Mail
La Vista   CalçAda Barra Bonita E MailLa Vista   CalçAda Barra Bonita E Mail
La Vista CalçAda Barra Bonita E Mailimoveisdorio
 
写给大家看的设计书(第3版)
写给大家看的设计书(第3版)写给大家看的设计书(第3版)
写给大家看的设计书(第3版)yiditushe
 
HW Vanguard Award - John Vong
HW Vanguard Award - John VongHW Vanguard Award - John Vong
HW Vanguard Award - John VongJohn I. Vong
 

Viewers also liked (9)

914 Foundation 2009 Portables
914 Foundation 2009 Portables914 Foundation 2009 Portables
914 Foundation 2009 Portables
 
I N T E R N A T I O N A L I S M Dr
I N T E R N A T I O N A L I S M  DrI N T E R N A T I O N A L I S M  Dr
I N T E R N A T I O N A L I S M Dr
 
S Monica Jardins Cond Club E Mail
S Monica Jardins Cond Club   E MailS Monica Jardins Cond Club   E Mail
S Monica Jardins Cond Club E Mail
 
web 2.0 seconda parte
web 2.0 seconda parteweb 2.0 seconda parte
web 2.0 seconda parte
 
La Vista CalçAda Barra Bonita E Mail
La Vista   CalçAda Barra Bonita E MailLa Vista   CalçAda Barra Bonita E Mail
La Vista CalçAda Barra Bonita E Mail
 
写给大家看的设计书(第3版)
写给大家看的设计书(第3版)写给大家看的设计书(第3版)
写给大家看的设计书(第3版)
 
Ram Central Park
Ram Central ParkRam Central Park
Ram Central Park
 
Jeimi tarea 2
Jeimi tarea 2Jeimi tarea 2
Jeimi tarea 2
 
HW Vanguard Award - John Vong
HW Vanguard Award - John VongHW Vanguard Award - John Vong
HW Vanguard Award - John Vong
 

Similar to Jinchao demo v7

Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxMd. Rakib Trofder
 
Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013Sebastian Huppmann
 
Tickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTerry Jones
 
CSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterCSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterMarcello Tomasini
 
SPSBE building an faq for end users
SPSBE building an faq for end usersSPSBE building an faq for end users
SPSBE building an faq for end usersPaul Hunt
 
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02BIWUG
 
B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...Thuan Ng
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptxThe Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptxlior mazor
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like systemAree Oh
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013Avtex
 
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Paul Hunt
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentCloudera, Inc.
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedJamie Grier
 

Similar to Jinchao demo v7 (20)

Jinchao demo v3
Jinchao demo v3Jinchao demo v3
Jinchao demo v3
 
Jinchao demo v6
Jinchao demo v6Jinchao demo v6
Jinchao demo v6
 
Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptx
 
Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013
 
Tickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDB
 
CSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterCSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from Twitter
 
SPSBE building an faq for end users
SPSBE building an faq for end usersSPSBE building an faq for end users
SPSBE building an faq for end users
 
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
 
B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...
 
Pharma
PharmaPharma
Pharma
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptxThe Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
 
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI Environment
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Find and recruit qualified candidates with Twitter
Find and recruit qualified candidates with TwitterFind and recruit qualified candidates with Twitter
Find and recruit qualified candidates with Twitter
 
Twitter Awesomeness
Twitter AwesomenessTwitter Awesomeness
Twitter Awesomeness
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Jinchao demo v7

  • 1. SEARCH  YOUR  TWEETS SEARCH  LIKE  A  PROFESSIONAL
  • 2. Motivation • Twitter  represents  a  rich  flow  of  information • Lack  of  an  effective  way  to  query  the  twitter • Hard  to  monitor  interested  topics  at  real  time
  • 3. Search  Tweets  Like  a  Professional A  Real  Time  Twitter  Search  Engine  That   Allows  you  to  Search  based  on: •Keywords ◦Country ◦Language ◦Negative  words Demo(http://searchyourtweet.info:5000/input)
  • 4. Keep  an  eye  on  your  interested  topic •Express  your  interest,  we  will  keep  you  update  on  the  newest  event •Video  (https://youtu.be/GdRmXNfukos)
  • 5. Data  pipeline Query  Controller Backend  Database percolator Logic  Layer Frontend Searching  database Data  Backup Pub/Sub Publish Matching  query Register  query searching
  • 6. Real  Time  Monitor  on  Twitter ◦Implemented  using  ElasticSearch Percolator ◦Think  it  as  “search  in  reverse” ◦ User  register  queries  into  percolator ◦ Percolator  match  incoming  documents  with  registered  queries ◦Challenge: ◦ How  to  design  the  percolator  data  pipeline? ◦ How  to  decouple  the  backend  database  with  frontend  server? ◦ Use  publish  /  subscribe  design  pattern
  • 7. Real  Time  Monitor  Data  Flow Percolator Query  database Twitter  database Controller Pub/Sub subscribe Open  channel
  • 8. Challenge Build  a  high  throughput  real  time   backend  data  pipeline? • Use  Logstash! ◦ Highly Scalable ◦ Compatiblewith  different  sources  and   destination A  scalable  high  throughput   pipelineCurrent  backend  pipeline
  • 9. Challenge • Real  time  update  on  frontend  client: • Instead  of  using  “setInterval()”  javascript function,  I  use  “socketIO”  to  keep   socket  open  between  front-­‐end  client  and  flask  server   • Construct  ElasticSearch query • Use  python  requests  library  to  query  ElasticSearch • Fine  tuning  on  ElasticSearch
  • 10. About  Me M.Math,  University  of  Waterloo ◦ Field:  Statistics  and  Machine  Learning B.S.,  University  of  Toronto ◦ Field:  Applied  Mathematics Data  Scientist  Intern,  Neon  Inc.,  San  Francisco Back-­‐end  Model  Developer,  MetricAid Inc.,  Toronto Experience  in  Deep  Learning:   ◦ Convolutional  Network,  Recurrent  Network •OS/161  (a  simplified  POSIX  OS)
  • 12. Parallelization  of  percolator • Will  consumes  a  lot   hardware:  O(mn) • Another  choice: Luwak +  Samza