SlideShare a Scribd company logo
What does it take
to make Google
work at scale?
Malte Schwarzkopf [@ms705]
Cambridge Systems at Scale
http://camsas.org – @CamSysAtScale
CamSa
S
CamSa
S
2http://camsas.org @CamSysAtScale
CamSa
S
3http://camsas.org @CamSysAtScale
CamSa
S
4http://camsas.org @CamSysAtScale
What happens here?
CamSa
S
5http://camsas.org @CamSysAtScale
What happens in those 139ms?
Internet
Google
datacenter
Your
computer
Front-end web
server
G+ idx
places
idx
web idx
? ?
?
? ?
?
CamSa
S
6http://camsas.org @CamSysAtScale
What we’ll chat about
1.Datacenter hardware
2.Scalable software stacks
a.Google
b.Facebook
3.Differences compared to HPC
4.Current research directions
Please ask questions - any
time! :)
CamSa
S
7http://camsas.org @CamSysAtScale
CamSa
S
8http://camsas.org @CamSysAtScale
CamSa
S
9http://camsas.org @CamSysAtScale
CamSa
S
10http://camsas.org @CamSysAtScale
CamSa
S
11http://camsas.org @CamSysAtScale
10,000+ machines
“Machine”
no chassis!
DC battery
~6 disk drives
mostly custom-made
Network
ToR switch
multi-path core
e.g., 3-stage fat tree or
CamSa
S
12http://camsas.org @CamSysAtScale
How’s this different to HPC?
Emphasis on commodity hardware
No expensive interconnect
Mid-range machines
Energy/performance/cost trade-off essential
Massive automation
Very small number of on-site staff
Automated software bootstrap
Fault tolerant design
Each component can fail
Software must be aware and compensate
CamSa
S
13http://camsas.org @CamSysAtScale
The software side
Machine MachineMachine
Linux kernel
(customized)
Linux kernel
(customized)
Linux kernel
(customized)
Transparent distributed systems
Containers
Containers
We’ll look at what goes here!
CamSa
S
14http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
15http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
16http://camsas.org @CamSysAtScale
GFS/Colossus
Bulk data block storage system
Optimized for large files (GB-size)
Supports small files, but not common case
Read, write, append modes
Colossus = GFSv2, adds some improvements
e.g., Reed-Solomon-based erasure coding
better support for latency-sensitive applications
sharded meta-data layer, rather than single master
CamSa
S
17http://camsas.org @CamSysAtScale
Application
GFS/Colossus: architecture
Chunkserver
GFS library
Chunkserver
ChunkserverGFS Master
Namespace
Metadata
CamSa
S
18http://camsas.org @CamSysAtScale
GFS/Colossus: writes
Application
GFS Master
Chunkserver
GFS library
Chunkserver
Chunkserver
Namespace
Metadata
Chain replication
CamSa
S
19http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
20http://camsas.org @CamSysAtScale
BigTable (2006)
• ‘Three-dimensional‘ key-value store:
– <row key, column key, timestamp> → value
• Effectively a distributed, sorted, sparse map
row
(key: string)
column
(key: string)
timestamp
(key: int64)
cell
<“larry.page“, “websearch“, 133746428> → “cat pictures“
CamSa
S
21http://camsas.org @CamSysAtScale
BigTable (2006)
• Distributed tablets (~1 GB) hold subsets of map
• On top of Colossus, which handles replication and
fault tolerance: only one (active) server per tablet!
• Reads & writes within a row are transactional
– Independently of the number of columns touched
– But: no cross-row transactions possible
– Turns out users find this hard to deal with
CamSa
S
22http://camsas.org @CamSysAtScale
BigTable (2006)
• META0 tablet is “root“ for name resolution
– Like a coordinator/leader
– Nested META1 tablets improve meta-data lookup
• Elect master (META0 tablet server) using
Chubby, and to maintain list of tablet servers &
schemas
– 5-way replicated Paxos consensus on data in Chubby
• Built atop GFS; building block for MegaStore
– “Next gen” with transactions: Spanner
CamSa
S
23http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
24http://camsas.org @CamSysAtScale
Spanner (2012)
• BigTable insufficient for some consistency needs
• Often have transactions across >1 data centres
– May buy app on Play Store while travelling in the U.S.
– Hit U.S. server, but customer billing data is in U.K.
– Or may need to update several replicas for fault tolerance
• Wide-area consistency is hard
– due to long delays and clock skew
– no global, universal notion of time
– NTP not accurate enough, PTP doesn’t work (jittery links)
CamSa
S
25http://camsas.org @CamSysAtScale
Spanner (2012)
• Spanner offers transactional consistency: full RDBMS
power, ACID properties, at global scale!
• Secret sauce: hardware-assisted clock sync
– Using GPS and atomic clocks in data centres
• Use global timestamps and Paxos to reach consensus
– Still have a period of uncertainty for write TX: wait it out!
– Each timestamp is an interval:
Definitely in
the future
tt.latest
Definitely in
the past
tt.earliest
tabs
CamSa
S
26http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
27http://camsas.org @CamSysAtScale
MapReduce (2004)
• Parallel programming framework for scale
– Run a program on 100’s to 10,000’s machines
• Framework takes care of:
– Parallelization, distribution, load-balancing, scaling up
(or down) & fault-tolerance
• Accessible: programmer provides two methods ;-)
– map(key, value) → list of <key’, value’> pairs
– reduce(key’, value’) → result
– Inspired by functional programming
CamSa
S
28http://camsas.org @CamSysAtScale
MapReduce
Input
Map
Reduce
Output
Shuffle
X: 5 X: 3 Y: 1 Y: 7
Y: 8X: 8
Results: X: 8, Y: 8
Perform Map() query against local data
matching input specification
Aggregate gathered results for each
intermediate key using Reduce()
End user can query results via
distributed key/value store
Slide originally due to S. Hand’s distributed systems lecture course at Cambridge:
http://www.cl.cam.ac.uk/teaching/1112/ConcDisSys/DistributedSystems-1B-H4.pdf
CamSa
S
29http://camsas.org @CamSysAtScale
MapReduce: Pros & Cons
• Extremely simple, and:
– Can auto-parallelize (since operations on every
element in input are independent)
– Can auto-distribute (since rely on underlying
Colossus/BigTable distributed storage)
– Gets fault-tolerance (since tasks are idempotent, i.e.
can just re-execute if a machine crashes)
• Doesn’t really use any sophisticated distributed
systems algorithms (except storage replication)
• However, not a panacea:
– Limited to batch jobs, and computations which are
expressible as a map() followed by a reduce()
CamSa
S
30http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
31http://camsas.org @CamSysAtScale
Dremel (2010)
Column-oriented store
For quick, interactive queries
...
Row-oriented storage
Column-oriented storage
CamSa
S
32http://camsas.org @CamSysAtScale
Dremel (2010)
Stores protocol buffers
Google’s universal serialization format
Nested messages → nested columns
Repeated fields → repeated records
Efficient encoding
Many sparse records: don’t store NULL fields
Record re-assembly
Need to put results back together into records
Use a Finite State Machine (FSM) defined by
protocol buffer structure
CamSa
S
33http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
34http://camsas.org @CamSysAtScale
Borg/Omega
Cluster manager and scheduler
Tracks machine and task liveness
Decides where to run what
Consolidates workloads onto machines
Efficiency gain, cost savings
Need fewer clusters
CamSa
S
35http://camsas.org @CamSysAtScale
BorgMaster
Borg/Omega
BorgMasterBorgMaster
Link shard
Borglet Borglet Borglet
Paxos-replicated
persistent store
Scheduler
Figure reproduced after A. Verma et al., “Large-scale cluster management at
Google with Borg”, Proceedings of EuroSys 2015.
CamSa
S
36http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Jobs/tasks: counts
CPU/RAM: resource seconds [i.e. resource * job runtime in sec.]
Cluster A
Medium size
Medium utilization
Cluster B
Large size
Medium utilization
Cluster C
Medium (12k mach.)
High utilization
Public trace
Figure from M. Schwarzkopf et al., “Omega: flexible, scalable schedulers for large
compute clusters”, Proceedings of EuroSys 2013.
CamSa
S
37http://camsas.org @CamSysAtScale
Borg/Omega: workloads
FractionofjobsrunningforlessthanX
Figure from M. Schwarzkopf et al., “Omega: flexible, scalable schedulers for large
compute clusters”, Proceedings of EuroSys 2013.
Service jobs run for much longer than batch jobs:
long-term user-facing services vs. one-off analytics.
CamSa
S
38http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Fractionofinter-arrivalgapslessthanX
Figure from M. Schwarzkopf et al., “Omega: flexible, scalable schedulers for large
compute clusters”, Proceedings of EuroSys 2013.
Batch jobs arrive more frequently than service jobs:
more numerous, shorter duration, fail more.
CamSa
S
39http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Batch jobs have a longer-tailed CPI distribution:
lower scheduling priority in kernel scheduler.
Figures from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
40http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Service workloads access memory more frequently:
larger working sets, less I/O.
Figures from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
41http://camsas.org @CamSysAtScale
How’s this different to HPC?
Generally avoid tight synchronization
No barrier-syncs!
Asynchronous replication, eventual consistency
Much more data-intensive
Lower compute-to-data ratio
Aggressive resource sharing for utilization
CamSa
S
42http://camsas.org @CamSysAtScale
The facebook Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
CamSa
S
43http://camsas.org @CamSysAtScale
The facebook Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
CamSa
S
44http://camsas.org @CamSysAtScale
The facebook Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
CamSa
S
45http://camsas.org @CamSysAtScale
Haystack & f4
Blob stores, hold photos, videos
not: status updates, messages, like counts
Items have a level of hotness
How many users are currently accessing this?
Baseline “cold” storage: MySQL
Want to cache close to users
Reduces network traffic
Reduces latency
But cache capacity is limited!
Replicate for performance, not resilience
CamSa
S
46http://camsas.org @CamSysAtScale
What about
other companies’ stacks?
CamSa
S
47http://camsas.org @CamSysAtScale
How about other companies?
Very similar stacks.
Microsoft, Yahoo, Twitter all similar in principle.
Typical set-up:
Front-end serving systems and fast back-ends.
Batch data processing systems.
Multi-tier structured/unstructured storage hierarchy.
Coordination system and cluster scheduler.
Minor differences owed to business focus
e.g., Amazon focused on inventory/shopping cart.
CamSa
S
48http://camsas.org @CamSysAtScale
Open source software
Lots of open reimplementations!
MapReduce → Hadoop, Spark, Metis
GFS → HDFS
BigTable → HBase, Cassandra
Borg/Omega → Mesos, Firmament
But also some releases from companies…
Presto (Facebook)
Kubernetes (Google)
CamSa
S
49http://camsas.org @CamSysAtScale
Current research
Many hot topics!
This distributed infrastructure is still new.
Make many-core machines go further
Increase utilization, schedule better.
Deal with co-location interference.
Rapid insights on huge datasets
Fast, incremental stream processing, e.g. Naiad.
Approximate analytics, e.g. BlinkDB.
Distributed algorithms
New consensus systems, e.g. RAFT.
CamSa
S
50http://camsas.org @CamSysAtScale
Conclusions
1.Running at huge (10k+ machines) scale
requires different software stacks.
2.Pretty interesting systems
a.Do read the papers!
b.(Partial) Bibliography:
http://malteschwarzkopf.de/research/assets/google-stack.pdf
http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
1.Lots of activity in this area!
a.Open source software
b.Research initiatives
Cambridge
Systems at Scale
@CamSysAtScale
camsas.org

More Related Content

What's hot

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
Data Con LA
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
Joydeep Sen Sarma
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
DataWorks Summit
 
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetupSteve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
bigdatalondon
 
Xldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsXldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsliqiang xu
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
Lester Martin
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Databricks
 
Voldemort
VoldemortVoldemort
Voldemort
fasiha ikram
 
March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
Yahoo Developer Network
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
Paco Nathan
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
Joydeep Sen Sarma
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
Michael Stack
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon
 

What's hot (20)

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetupSteve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
 
Xldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsXldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalytics
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Voldemort
VoldemortVoldemort
Voldemort
 
March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
 

Similar to What does it take to make google work at scale

Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Ga4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteGa4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteMatt Massie
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudGeorge Ang
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
Narendranath Reddy T
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
Joe Stein
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
Guang Xu
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
Amazon Web Services
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
Cliff Gilmore
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
DataWorks Summit/Hadoop Summit
 
Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916
Ian Pointer
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
Amazon Web Services
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
Asis Mohanty
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
Amazon Web Services
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
DataStax
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
SnappyData
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
Amazon Web Services
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
zznate
 

Similar to What does it take to make google work at scale (20)

Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Ga4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteGa4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger institute
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the Cloud
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 

More from xlight

淘宝无线电子商务数据报告
淘宝无线电子商务数据报告淘宝无线电子商务数据报告
淘宝无线电子商务数据报告
xlight
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filterxlight
 
Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0xlight
 
Oracle ha
Oracle haOracle ha
Oracle haxlight
 
Oracle 高可用概述
Oracle 高可用概述Oracle 高可用概述
Oracle 高可用概述xlight
 
Stats partitioned table
Stats partitioned tableStats partitioned table
Stats partitioned table
xlight
 
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
xlight
 
C/C++与Lua混合编程
C/C++与Lua混合编程C/C++与Lua混合编程
C/C++与Lua混合编程
xlight
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systemsxlight
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systemsxlight
 
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile ServiceHigh Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
xlight
 
PgSQL vs MySQL
PgSQL vs MySQLPgSQL vs MySQL
PgSQL vs MySQL
xlight
 
SpeedGeeks
SpeedGeeksSpeedGeeks
SpeedGeeksxlight
 
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
xlight
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Make Your web Work
Make Your web WorkMake Your web Work
Make Your web Workxlight
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 

More from xlight (20)

淘宝无线电子商务数据报告
淘宝无线电子商务数据报告淘宝无线电子商务数据报告
淘宝无线电子商务数据报告
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filter
 
Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0
 
Oracle ha
Oracle haOracle ha
Oracle ha
 
Oracle 高可用概述
Oracle 高可用概述Oracle 高可用概述
Oracle 高可用概述
 
Stats partitioned table
Stats partitioned tableStats partitioned table
Stats partitioned table
 
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
 
C/C++与Lua混合编程
C/C++与Lua混合编程C/C++与Lua混合编程
C/C++与Lua混合编程
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
 
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile ServiceHigh Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
 
PgSQL vs MySQL
PgSQL vs MySQLPgSQL vs MySQL
PgSQL vs MySQL
 
SpeedGeeks
SpeedGeeksSpeedGeeks
SpeedGeeks
 
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
 
UDT
UDTUDT
UDT
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Make Your web Work
Make Your web WorkMake Your web Work
Make Your web Work
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 

Recently uploaded

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 

Recently uploaded (20)

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 

What does it take to make google work at scale