5. Web
Pages
&
Links
100+
PB
Logs
100+
PB
UGC
1
PB
Web
News
PostBar
Encyclopedia
Knows
Searches,
Clicks,
Posts
etc.
1 petabyte = 2x National Library of China
6. Logs
100+
PB
UGC
1+
PB
2005
2006
2007
2008
2009
2010
2011
2012
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
• 95%
of
the
data
was
created
within
the
last
3
years
• 100
PB
of
new
data
is
processed
everyday
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
100
PB
Growth
:
100%+
YoY
9. Software
Innovations
• Global Optimization
• Multiple Replication
• Data Distribution
• Partial Update
MONOLITHIC HW
TRADITIONAL
RELATIONAL
DATABASE
DIRECT RECORD ACCESS OR QUERIES
TRADITIONAL
SERVER
STACK
MAPREDUCE
NOSQL
DATABASE
PARALLEL
RELATIONAL
DATABASE
HADOOP
DISTRIBUTED HARDWARE
NEW
SERVER
STACK
10. • Real-time online learning
• Tens of billions training
samples
• Billions of complex features
Feature
extraction
Model
Training
Models
Query
Advanced
Search
Module
CTR-server
Logs
Offline
Online
Big
Data
+
Web
Search
11. • Real-‐Rme
DicRonary
Updates
• Dynamic
Result
Modeling
• High-‐frequency
Inputs
RecommendaRon
Big
Data
+
IME
User
Input
NLP
Module
Consolidated
Search Result
On-Device
Quick
Search
Cloud-
based
Dictionary
Device-
based
Dictionary
Output