SlideShare a Scribd company logo
SEARCH	
  YOUR	
  TWEETS
SEARCH	
  LIKE	
  A	
  PROFESSIONAL
Motivation
• Twitter	
  represents	
  a	
  rich	
  flow	
  of	
  information
• Lack	
  of	
  an	
  effective	
  way	
  to	
  query	
  the	
  twitter
• Hard	
  to	
  monitor	
  interested	
  topics	
  at	
  real	
  time
Search	
  Tweets	
  Like	
  a	
  Professional
A	
  Real	
  Time	
  Twitter	
  Search	
  Engine	
  That	
  
Allows	
  you	
  to	
  Search	
  based	
  on:
•Keywords
◦Country
◦Language
◦Negative	
  words
Demo(http://searchyourtweet.info:5000/input)
Keep	
  an	
  eye	
  on	
  your	
  interested	
  topic
•Express	
  your	
  interest,	
  we	
  will	
  keep	
  you	
  update	
  on	
  the	
  newest	
  event
•Video	
  (https://youtu.be/GdRmXNfukos)
Data	
  pipeline
Query	
  Controller
Backend	
  Database
percolator
Logic	
  Layer Frontend
Searching	
  database
Data	
  Backup
Pub/Sub
Publish
Matching	
  query
Register	
  query
searching
Real	
  Time	
  Monitor	
  on	
  Twitter
◦Implemented	
  using	
  ElasticSearch Percolator
◦Think	
  it	
  as	
  “search	
  in	
  reverse”
◦ User	
  register	
  queries	
  into	
  percolator
◦ Percolator	
  match	
  incoming	
  documents	
  with	
  registered	
  queries
◦Challenge:
◦ How	
  to	
  design	
  the	
  percolator	
  data	
  pipeline?
◦ How	
  to	
  decouple	
  the	
  backend	
  database	
  with	
  frontend	
  server?
◦ Use	
  publish	
  /	
  subscribe	
  design	
  pattern
Real	
  Time	
  Monitor	
  Data	
  Flow
Percolator
Query	
  database
Twitter	
  database
Controller
Pub/Sub
subscribe
Open	
  channel
Challenge
Build	
  a	
  high	
  throughput	
  real	
  time	
  
backend	
  data	
  pipeline?
• Use	
  Logstash!
◦ Highly Scalable
◦ Compatiblewith	
  different	
  sources	
  and	
  
destination
A	
  scalable	
  high	
  throughput	
   pipelineCurrent	
  backend	
  pipeline
Challenge
• Real	
  time	
  update	
  on	
  frontend	
  client:
• Instead	
  of	
  using	
  “setInterval()”	
  javascript function,	
  I	
  use	
  “socketIO”	
  to	
  keep	
  
socket	
  open	
  between	
  front-­‐end	
  client	
  and	
  flask	
  server	
  
• Construct	
  ElasticSearch query
• Use	
  python	
  requests	
  library	
  to	
  query	
  ElasticSearch
• Fine	
  tuning	
  on	
  ElasticSearch
About	
  Me
M.Math,	
  University	
  of	
  Waterloo
◦ Field:	
  Statistics	
  and	
  Machine	
  Learning
B.S.,	
  University	
  of	
  Toronto
◦ Field:	
  Applied	
  Mathematics
Data	
  Scientist	
  Intern,	
  Neon	
  Inc.,	
  San	
  Francisco
Back-­‐end	
  Model	
  Developer,	
  MetricAid Inc.,	
  Toronto
Experience	
  in	
  Deep	
  Learning:	
  
◦ Convolutional	
  Network,	
  Recurrent	
  Network
•OS/161	
  (a	
  simplified	
  POSIX	
  OS)
Questions?
Thank	
  you!	
  
Parallelization	
  of	
  percolator
• Will	
  consumes	
  a	
  lot	
  
hardware:	
  O(mn)
• Another	
  choice:
Luwak +	
  Samza

More Related Content

Viewers also liked

914 Foundation 2009 Portables
914 Foundation 2009 Portables914 Foundation 2009 Portables
914 Foundation 2009 Portables
Rafael Lebron
 
I N T E R N A T I O N A L I S M Dr
I N T E R N A T I O N A L I S M  DrI N T E R N A T I O N A L I S M  Dr
I N T E R N A T I O N A L I S M Dr
ghanyog
 
S Monica Jardins Cond Club E Mail
S Monica Jardins Cond Club   E MailS Monica Jardins Cond Club   E Mail
S Monica Jardins Cond Club E Mailimoveisdorio
 
web 2.0 seconda parte
web 2.0 seconda parteweb 2.0 seconda parte
web 2.0 seconda parte
Angelo Panini
 
La Vista CalçAda Barra Bonita E Mail
La Vista   CalçAda Barra Bonita E MailLa Vista   CalçAda Barra Bonita E Mail
La Vista CalçAda Barra Bonita E Mailimoveisdorio
 
写给大家看的设计书(第3版)
写给大家看的设计书(第3版)写给大家看的设计书(第3版)
写给大家看的设计书(第3版)yiditushe
 
Ram Central Park
Ram Central ParkRam Central Park
Ram Central Park
Mireia Buchaca
 
Jeimi tarea 2
Jeimi tarea 2Jeimi tarea 2
Jeimi tarea 2
jeimi mejia
 
HW Vanguard Award - John Vong
HW Vanguard Award - John VongHW Vanguard Award - John Vong
HW Vanguard Award - John VongJohn I. Vong
 

Viewers also liked (9)

914 Foundation 2009 Portables
914 Foundation 2009 Portables914 Foundation 2009 Portables
914 Foundation 2009 Portables
 
I N T E R N A T I O N A L I S M Dr
I N T E R N A T I O N A L I S M  DrI N T E R N A T I O N A L I S M  Dr
I N T E R N A T I O N A L I S M Dr
 
S Monica Jardins Cond Club E Mail
S Monica Jardins Cond Club   E MailS Monica Jardins Cond Club   E Mail
S Monica Jardins Cond Club E Mail
 
web 2.0 seconda parte
web 2.0 seconda parteweb 2.0 seconda parte
web 2.0 seconda parte
 
La Vista CalçAda Barra Bonita E Mail
La Vista   CalçAda Barra Bonita E MailLa Vista   CalçAda Barra Bonita E Mail
La Vista CalçAda Barra Bonita E Mail
 
写给大家看的设计书(第3版)
写给大家看的设计书(第3版)写给大家看的设计书(第3版)
写给大家看的设计书(第3版)
 
Ram Central Park
Ram Central ParkRam Central Park
Ram Central Park
 
Jeimi tarea 2
Jeimi tarea 2Jeimi tarea 2
Jeimi tarea 2
 
HW Vanguard Award - John Vong
HW Vanguard Award - John VongHW Vanguard Award - John Vong
HW Vanguard Award - John Vong
 

Similar to Jinchao demo v7

Jinchao demo v3
Jinchao demo v3Jinchao demo v3
Jinchao demo v3
Jinchao Lin
 
Jinchao demo v6
Jinchao demo v6Jinchao demo v6
Jinchao demo v6
Jinchao Lin
 
Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptx
Md. Rakib Trofder
 
Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013
Sebastian Huppmann
 
Tickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDB
Terry Jones
 
CSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterCSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from Twitter
Marcello Tomasini
 
SPSBE building an faq for end users
SPSBE building an faq for end usersSPSBE building an faq for end users
SPSBE building an faq for end users
Paul Hunt
 
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
BIWUG
 
B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...
Thuan Ng
 
Pharma
PharmaPharma
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptxThe Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
lior mazor
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
Aree Oh
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
Avtex
 
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Paul Hunt
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI Environment
Cloudera, Inc.
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
Jamie Grier
 
Find and recruit qualified candidates with Twitter
Find and recruit qualified candidates with TwitterFind and recruit qualified candidates with Twitter
Find and recruit qualified candidates with Twitter
Recruitment Process Outsourcing Association
 
Twitter Awesomeness
Twitter AwesomenessTwitter Awesomeness
Twitter Awesomeness
Damon Cortesi
 

Similar to Jinchao demo v7 (20)

Jinchao demo v3
Jinchao demo v3Jinchao demo v3
Jinchao demo v3
 
Jinchao demo v6
Jinchao demo v6Jinchao demo v6
Jinchao demo v6
 
Twitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptxTwitter Timeline and Search Distributed System.pptx
Twitter Timeline and Search Distributed System.pptx
 
Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013Twitter API, Streaming and SharePoint 2013
Twitter API, Streaming and SharePoint 2013
 
Tickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDB
 
CSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from TwitterCSE5656 Complex Networks - Gathering Data from Twitter
CSE5656 Complex Networks - Gathering Data from Twitter
 
SPSBE building an faq for end users
SPSBE building an faq for end usersSPSBE building an faq for end users
SPSBE building an faq for end users
 
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
Spsbe buildinganfaqforendusers-150422122027-conversion-gate02
 
B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...
 
Pharma
PharmaPharma
Pharma
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptxThe Hacking Game - Think Like a Hacker Meetup 12072023.pptx
The Hacking Game - Think Like a Hacker Meetup 12072023.pptx
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013TechFuse 2013 - Break down the walls SharePoint 2013
TechFuse 2013 - Break down the walls SharePoint 2013
 
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
Creating an FAQ for end users, An evolution of an idea - SharePoint Saturday ...
 
Integrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI EnvironmentIntegrating Hadoop in Your Existing DW and BI Environment
Integrating Hadoop in Your Existing DW and BI Environment
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Find and recruit qualified candidates with Twitter
Find and recruit qualified candidates with TwitterFind and recruit qualified candidates with Twitter
Find and recruit qualified candidates with Twitter
 
Twitter Awesomeness
Twitter AwesomenessTwitter Awesomeness
Twitter Awesomeness
 

Recently uploaded

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
Claudio Di Ciccio
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
CAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on BlockchainCAKE: Sharing Slices of Confidential Data on Blockchain
CAKE: Sharing Slices of Confidential Data on Blockchain
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Jinchao demo v7

  • 1. SEARCH  YOUR  TWEETS SEARCH  LIKE  A  PROFESSIONAL
  • 2. Motivation • Twitter  represents  a  rich  flow  of  information • Lack  of  an  effective  way  to  query  the  twitter • Hard  to  monitor  interested  topics  at  real  time
  • 3. Search  Tweets  Like  a  Professional A  Real  Time  Twitter  Search  Engine  That   Allows  you  to  Search  based  on: •Keywords ◦Country ◦Language ◦Negative  words Demo(http://searchyourtweet.info:5000/input)
  • 4. Keep  an  eye  on  your  interested  topic •Express  your  interest,  we  will  keep  you  update  on  the  newest  event •Video  (https://youtu.be/GdRmXNfukos)
  • 5. Data  pipeline Query  Controller Backend  Database percolator Logic  Layer Frontend Searching  database Data  Backup Pub/Sub Publish Matching  query Register  query searching
  • 6. Real  Time  Monitor  on  Twitter ◦Implemented  using  ElasticSearch Percolator ◦Think  it  as  “search  in  reverse” ◦ User  register  queries  into  percolator ◦ Percolator  match  incoming  documents  with  registered  queries ◦Challenge: ◦ How  to  design  the  percolator  data  pipeline? ◦ How  to  decouple  the  backend  database  with  frontend  server? ◦ Use  publish  /  subscribe  design  pattern
  • 7. Real  Time  Monitor  Data  Flow Percolator Query  database Twitter  database Controller Pub/Sub subscribe Open  channel
  • 8. Challenge Build  a  high  throughput  real  time   backend  data  pipeline? • Use  Logstash! ◦ Highly Scalable ◦ Compatiblewith  different  sources  and   destination A  scalable  high  throughput   pipelineCurrent  backend  pipeline
  • 9. Challenge • Real  time  update  on  frontend  client: • Instead  of  using  “setInterval()”  javascript function,  I  use  “socketIO”  to  keep   socket  open  between  front-­‐end  client  and  flask  server   • Construct  ElasticSearch query • Use  python  requests  library  to  query  ElasticSearch • Fine  tuning  on  ElasticSearch
  • 10. About  Me M.Math,  University  of  Waterloo ◦ Field:  Statistics  and  Machine  Learning B.S.,  University  of  Toronto ◦ Field:  Applied  Mathematics Data  Scientist  Intern,  Neon  Inc.,  San  Francisco Back-­‐end  Model  Developer,  MetricAid Inc.,  Toronto Experience  in  Deep  Learning:   ◦ Convolutional  Network,  Recurrent  Network •OS/161  (a  simplified  POSIX  OS)
  • 12. Parallelization  of  percolator • Will  consumes  a  lot   hardware:  O(mn) • Another  choice: Luwak +  Samza