SlideShare a Scribd company logo
1 of 24
Download to read offline
Tha	
  Anatomy	
  of	
  a	
  Large-­‐Scale	
  
Social	
  Search	
  Engine,	
  www2010	
                               	
  
•  Damon	
  Horowitz,	
  Sepandar	
  D.	
  Kamvar	
  
•  The	
  Anatomy	
  of	
  a	
  Large-­‐Scale	
  Social	
  Search	
  
   Engine	
  
•  WWW	
  2010	
  

•  Aardvark	
             QA                              	
  
•  web                                            	
  
•  QA              	
  
•                         	
  
• 
                                 	
  
• 
            	
  

•  Google
•        	
  Aardvark	
                                 •           :	
  Google	
  
•                                  	
                   •                                  	
  
•                           	
                          • 
•                                         	
                 	
  
•                                                	
     •                                         	
  
                                                        •                                                	

                                                          	
  
“Do	
  you	
  have	
  any	
  good	
  babysiLer	
  recommendaMons	
  in	
  Palo	
  
Alto	
  for	
  my	
  6-­‐year-­‐old	
  twins?	
  I’m	
  looking	
  for	
  somebody	
  that	
  
won’t	
  let	
  them	
  watch	
  TV.”
•  Crawler	
  and	
  Indexer	
  
     –                                                	
  
•  Query	
  Analyzer	
  
     –               	
  
•  Ranking	
  FuncMon	
  
     –                             	
  
•  UI	
  
     –                                         UI
s(ui ,u j ,q) = p(ui | u j ) • p(ui | q)
                = p(ui | u j )∑ p(ui | t) p(t | q)
                                  t∈T


• p(ui|uj):	
  quality	
  score	
  
• p(ui|q):	
  relevance	
  score	
  
•                                    	
  

u:             q:            t:             	
  
P(ui|t)                                        	
•                  	
                                          p(t | ui ) p(ui )
                                                   p(ui | t) =
•                                           	
                       p(t)
•                                    	
            s(t | ui ) = p(t | ui ) + γ ∑u∈U p(t | u)
     • facebook    	
  
• blog      	
                                     ∑ p(t | u ) = 1
                                                              i
•                  /twiLer	
                       t∈T


                                     €


                                 €
•                         	
  
     •                                                                        	
  
     • 
P(ui|uj)                    	
• 
                    	
  
     –           	
  
     –                                          	
  
     –                                   	
  
     –    	
  
     –                         	
  
     –                            	
  
     –    	
  
     – 
P(t|q)                       :	
     	
•  Non	
  QuesMon	
  Classifier	
  
   –                       	
  
•  Inappropriate	
  QuesMon	
  Classifier	
  
   –                	
  
•  Trivial	
  QuesMon	
  Classifier	
  
   –                                                  	
  
•  LocaMon	
  SensiMve	
  Classifier	
  
   – 
P(t|q)                        :	
                    	
•                          	
  
     –  Keyword	
  Match	
  Topic	
  Mapper	
  
         •                                       	
  
     –  Taxonomy	
  Topic	
  Mapper	
  
         •  SVM 3000                             	
  
     –  Salient	
  Term	
  Topic	
  Mapper	
  
         •  d-­‐idf                                     	
  
     –  User	
  Tag	
  Topic	
  Mapper	
  
         • 
•                                                  	
  
     –  Topic	
  ExperMse:	
  p(ui|q)	
  
     –  Connectedness:	
  p(ui|uj)	
  
     –  Availability:	
                                   	
  
•                  	
  
     – 
                                            	
  
•                        	
  
     –  Google PC               	
  
•  Mobile	
  Google   Aardvark
      	
  
     –  Google                         Aardvark
• 
             	
  
•                        	
  




                                  	
                                        	
Aardvark	
                             18.6	
  words	
                 98.1%	
                    	
          2.2	
   	
  2.9	
  words	
        57	
   	
  63%
•                   	
  
     –  fact
•  57.2% 10                 	
  
     –  facebook 15.7% 15          	
  
•             6 37
•  87.7%                	
  
•      2.08
•  97.7%       3               	
  
•  174,605         	
  
•      1,199,323
•  Google            	
  
     –  200     Aardvark                 	
  
     –  Aardvark                         google
                                     5                                	
  
     –  10                                                     	
  

                             	
                 	
                                  	

Aardvark	
                        5 	
               71.5%	
                 3.93	
  ±	
  1.23	

Google	
                          2 	
               70.5%	
                 3.07	
  ±	
  1.46
•                                          	
  
     –                              	
  
• 
                             	
  
• 
                      	
  
•              	
  
• 
•  “       ”       Aardvark   	
  
•  Aardvark          	
  
•  Aardvark          	
  

•  “           ”
                       	
  
• 

More Related Content

More from Jun Harada

More from Jun Harada (13)

決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇
 
mybo concept v1.00
mybo concept v1.00mybo concept v1.00
mybo concept v1.00
 
IoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジIoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジ
 
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
 
(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座
 
コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々
 
ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介
 
2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集
 
IoT Business in Japan
IoT Business in JapanIoT Business in Japan
IoT Business in Japan
 
東京研修プログラム
東京研修プログラム東京研修プログラム
東京研修プログラム
 
20170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.0020170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.00
 
西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップ西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップ
 
IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 

Lab seminar20100604

  • 1. Tha  Anatomy  of  a  Large-­‐Scale   Social  Search  Engine,  www2010  
  • 2. •  Damon  Horowitz,  Sepandar  D.  Kamvar   •  The  Anatomy  of  a  Large-­‐Scale  Social  Search   Engine   •  WWW  2010   •  Aardvark   QA   •  web  
  • 3. •  QA   •    •    •    •  Google
  • 4. •   Aardvark   •  :  Google   •    •    •    •  •      •    •    •    “Do  you  have  any  good  babysiLer  recommendaMons  in  Palo   Alto  for  my  6-­‐year-­‐old  twins?  I’m  looking  for  somebody  that   won’t  let  them  watch  TV.”
  • 5. •  Crawler  and  Indexer   –    •  Query  Analyzer   –    •  Ranking  FuncMon   –    •  UI   –  UI
  • 6.
  • 7. s(ui ,u j ,q) = p(ui | u j ) • p(ui | q) = p(ui | u j )∑ p(ui | t) p(t | q) t∈T • p(ui|uj):  quality  score   • p(ui|q):  relevance  score   •    u: q: t:  
  • 8. P(ui|t) •    p(t | ui ) p(ui ) p(ui | t) = •    p(t) •    s(t | ui ) = p(t | ui ) + γ ∑u∈U p(t | u) • facebook   • blog   ∑ p(t | u ) = 1 i •  /twiLer   t∈T € € •    •    • 
  • 9. P(ui|uj) •    –    –    –    –    –    –    –    – 
  • 10. P(t|q) :   •  Non  QuesMon  Classifier   –    •  Inappropriate  QuesMon  Classifier   –    •  Trivial  QuesMon  Classifier   –    •  LocaMon  SensiMve  Classifier   – 
  • 11. P(t|q) :   •    –  Keyword  Match  Topic  Mapper   •    –  Taxonomy  Topic  Mapper   •  SVM 3000   –  Salient  Term  Topic  Mapper   •  d-­‐idf   –  User  Tag  Topic  Mapper   • 
  • 12. •    –  Topic  ExperMse:  p(ui|q)   –  Connectedness:  p(ui|uj)   –  Availability:     •    –   
  • 13.
  • 14.
  • 15.
  • 16. •    –  Google PC   •  Mobile  Google Aardvark   –  Google Aardvark
  • 17. •    •    Aardvark 18.6  words 98.1% 2.2    2.9  words 57    63%
  • 18. •    –  fact
  • 19. •  57.2% 10   –  facebook 15.7% 15   •  6 37
  • 20. •  87.7%   •  2.08
  • 21. •  97.7% 3   •  174,605   •  1,199,323
  • 22. •  Google   –  200 Aardvark   –  Aardvark google 5   –  10   Aardvark 5 71.5% 3.93  ±  1.23 Google 2 70.5% 3.07  ±  1.46
  • 23. •    –    •    •    •    • 
  • 24. •  “ ” Aardvark   •  Aardvark   •  Aardvark   •  “ ”   •