SlideShare a Scribd company logo
1 of 23
Distributed “Web Scale” Systems



                      Ricardo Vice Santos
                            @ricardovice
Who am I?
•  I’m Ricardo!
•  Lead Engineer at Spotify
•  ricardovice on twitter, spotify, about.me, kiva, slideshare, github,
   bitbucket, delicious…
•  Portuguese
•  Previously working in the video streaming industry
•  (only) Discovered Spotify late 2009
•  Joined in 2010
spotifiera:           to use Spotify;
spo·ti·fie·ra   Verb to provide a service free of cost;
What’s Spotify all about?
•  A big catalogue, tons of music
•  Available everywhere
•  Great user experience
•  More convenient than piracy
•  Reliable, high availability
•  Scalable for many, many users
But what really got me hooked up:
•  Free, legal ad-supported service
•  Very fast
The importance of being fast
•  High latency can be a problem, not only in First
   Person Shooters
•  Slow performance is a major user experience killer
•  At Velocity 2009, Eric Schurman (Bing) and Jake
   Brutlag (Google Search) showed that increased
   latency directly hurt usage and revenue per user[1].
•  Latency leads to users leaving, many wont ever
   come back
•  Users will share their experience with friends


          [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
So how fast is Spotify?
•  We monitor playback latency on the client side
•  Current median latency to play any track is 265ms
•  On average, the human notion of “instant” is
   anything under 200ms
•  Due to disk lookup, at times it's actually faster to
   start playing a track from network than from disk
•  Below 1% of playbacks experienced stutter
“Spotify is fast due to P2P”
•  This is something I read a lot around the web
•  P2P does play a crucial role in the picture, but…
•  Experience at Spotify showed me that most latency issues are
   directly linked to backend problems
•  It’s a mistake to think that we could be this fast without a smart and
   scalable backend architecture

So let’s give credit where credit is due.
Going web scale!!1




“Scaling Twitter”
Blaine Cook, 2007
http://www.slideshare.net/Blaine/scaling-twitter
Handling growth
Things to keep in mind:
•  Scaling is not an exact science
•  There is no such thing as a magic formula
•  Usage patterns differ
•  There is always a limit to what you can handle
•  Fail gracefully
•  Continuous evolution process
Scaling horizontally
•    You can always add more machines!
•    Stateless services
•    Several processes can share memcached
•    Possible to run in “the cloud” (EC2, Rackspace)
•    Need some kind of load balancer
•    Data sharing/synchronization can be hard
•    Complexity: many pieces, maybe hidden SPOFs
•    Fundamental to the application’s design
Usage patterns
Typically, some services are more demanding than
others, this can be due to:
•  Higher popularity
•  Higher complexity
•  Low latency expectation
•  All combined
Decoupling
•    Divide and conquer!
•    The Unix way
•    Resources assigned individually
•    Using the right tools to address each problem
•    Organization and delegation
•    Problems are isolated
•    Easier to handle growth
Read only services
•    The easiest to scale
•    Stateless
•    Use indices, large read-optimized data containers
•    Each node has its local copy
•    Data structured according to service
•    Updated periodically, during off-peak hours
•    Take advantage of OS page cache
Read-write services
•  User generated content, e.g. playlists
•  Hard to ensure consistence of data across instances

Solutions:
•  Eventual consistency:
   •  Reads of just written data not guaranteed to be up-to-date
•  Locking, atomic operations
    •  Creating globally unique keys, e.g. usernames
    •  Transactions, e.g. billing
Decoupling at Spotify
Finding a service via DNS
Each service has an SRV DNS record:
•  One record with same name for each service instance
•  Clients (AP) resolve to find servers providing that service
•  Lowest priority record is chosen with weighted shuffle
•  Clients retry other instances in case of failures

Example SRV record
_frobnicator._http.example.com. 3600 SRV 10     50   8081 frob1.example.com.!
       name                     TTL type prio weight port      host!
Request assignment
•    Hardware load balancers
•    Round-robin DNS
•    Proxy servers
•    Sharding:
      •  Each server/instance responsible for subset of data
      •  Directs client to instance that has its data
      •  Easy if nothing is shared
      •  Hard if you require replication
Sharding using a DHT
Some Spotify services use Dynamo inspired DHTs[1]:
•  Each request has a key
•  Each service node is responsible for a range of hash keys
•  Data is distributed among service nodes
•  Redundancy is ensured by re-hashing and writing to replica node
•  Data must be transitioned when ring changes
!




         [1] http://dl.acm.org/citation.cfm?id=1294281
DHT example
Spotify’s DNS powered DHT
Configuration of DHT
config._frobnicator._http.example.com.     3600    TXT          “slaves=0”!
      config.srv_name.                     TTL     type   !   no replication!
!
config._frobnicator._http.example.com.     3600    TXT      “slaves=2 redundancy=host”!
      config.srv_name.                     TTL!    type   !      three replicas!
                                                                on separate hosts!

Ring segment, one per node
tokens.8081.frob1.example.com.   3600    TXT      “00112233445566778899aabbccddeeff”!
      tokens.port.host.          TTL     type                last key!
!
And if none of this works for you
Remember
/dev/null is
web scale!!




          http://www.xtranormal.com/watch/6995033/
Questions?
                     get in touch!
                    @ricardovice
             ricardo@spotify.com
Thank you.

                    @ricardovice
             ricardo@spotify.com

More Related Content

What's hot

Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Adam Kawa
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
Albert Chen
 
How to Build a High Performance Application with PHP and Swoole?
How to Build a High Performance Application with PHP and Swoole?How to Build a High Performance Application with PHP and Swoole?
How to Build a High Performance Application with PHP and Swoole?
Albert Chen
 

What's hot (20)

DevOps best practices with OpenShift
DevOps best practices with OpenShiftDevOps best practices with OpenShift
DevOps best practices with OpenShift
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Nginx Architecture
Nginx ArchitectureNginx Architecture
Nginx Architecture
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Make your own Pokédex with the Pokéapi & Node/Express!
Make your own Pokédex with the Pokéapi & Node/Express! Make your own Pokédex with the Pokéapi & Node/Express!
Make your own Pokédex with the Pokéapi & Node/Express!
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Can SFUs and MCUs be friends @ IIT-RTC 2020
Can SFUs and MCUs be friends @ IIT-RTC 2020Can SFUs and MCUs be friends @ IIT-RTC 2020
Can SFUs and MCUs be friends @ IIT-RTC 2020
 
Inter-Process Communication in Microservices using gRPC
Inter-Process Communication in Microservices using gRPCInter-Process Communication in Microservices using gRPC
Inter-Process Communication in Microservices using gRPC
 
Scaling server side web rtc applications the janus challenge by lorenzo miniero
Scaling server side web rtc applications the janus challenge by lorenzo minieroScaling server side web rtc applications the janus challenge by lorenzo miniero
Scaling server side web rtc applications the janus challenge by lorenzo miniero
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
Developing rich multimedia applications with Kurento: a tutorial for JavaScri...
 
How to Build a High Performance Application with PHP and Swoole?
How to Build a High Performance Application with PHP and Swoole?How to Build a High Performance Application with PHP and Swoole?
How to Build a High Performance Application with PHP and Swoole?
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
 
RubiX
RubiXRubiX
RubiX
 
Microservices at Spotify
Microservices at SpotifyMicroservices at Spotify
Microservices at Spotify
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21WebRTC, RED and Janus @ ClueCon21
WebRTC, RED and Janus @ ClueCon21
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 

Viewers also liked

Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion team
Jungkoo Kim
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 town
Jacket25
 
Two wrongs don’t make a right
Two wrongs don’t make a rightTwo wrongs don’t make a right
Two wrongs don’t make a right
BillGENGL1021
 

Viewers also liked (20)

Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613
 
Astronaut Wheelock Pictures
Astronaut Wheelock PicturesAstronaut Wheelock Pictures
Astronaut Wheelock Pictures
 
Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion team
 
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
 
2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns 2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns
 
Sharing china photos on flickr
Sharing china photos on flickrSharing china photos on flickr
Sharing china photos on flickr
 
Quechua
QuechuaQuechua
Quechua
 
17 icsqcc hayal koksal
17 icsqcc hayal koksal17 icsqcc hayal koksal
17 icsqcc hayal koksal
 
Guide for One Person Company Registration
Guide for One Person Company RegistrationGuide for One Person Company Registration
Guide for One Person Company Registration
 
www.toneabs.info
www.toneabs.infowww.toneabs.info
www.toneabs.info
 
Kỹ năng thuyết trình
Kỹ năng thuyết trìnhKỹ năng thuyết trình
Kỹ năng thuyết trình
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
Bibliotecas famosas
Bibliotecas famosasBibliotecas famosas
Bibliotecas famosas
 
Lurdes
LurdesLurdes
Lurdes
 
2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 town
 
Azkena rock
Azkena rockAzkena rock
Azkena rock
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds 2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds
 
Two wrongs don’t make a right
Two wrongs don’t make a rightTwo wrongs don’t make a right
Two wrongs don’t make a right
 

Similar to Distributed "Web Scale" Systems

Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
Tomas Doran
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
jaxconf
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
Abdelmonaim Remani
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similar to Distributed "Web Scale" Systems (20)

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streaming
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Realtime web2012
Realtime web2012Realtime web2012
Realtime web2012
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Distributed "Web Scale" Systems

  • 1. Distributed “Web Scale” Systems Ricardo Vice Santos @ricardovice
  • 2. Who am I? •  I’m Ricardo! •  Lead Engineer at Spotify •  ricardovice on twitter, spotify, about.me, kiva, slideshare, github, bitbucket, delicious… •  Portuguese •  Previously working in the video streaming industry •  (only) Discovered Spotify late 2009 •  Joined in 2010
  • 3. spotifiera: to use Spotify; spo·ti·fie·ra Verb to provide a service free of cost;
  • 4. What’s Spotify all about? •  A big catalogue, tons of music •  Available everywhere •  Great user experience •  More convenient than piracy •  Reliable, high availability •  Scalable for many, many users But what really got me hooked up: •  Free, legal ad-supported service •  Very fast
  • 5. The importance of being fast •  High latency can be a problem, not only in First Person Shooters •  Slow performance is a major user experience killer •  At Velocity 2009, Eric Schurman (Bing) and Jake Brutlag (Google Search) showed that increased latency directly hurt usage and revenue per user[1]. •  Latency leads to users leaving, many wont ever come back •  Users will share their experience with friends [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
  • 6. So how fast is Spotify? •  We monitor playback latency on the client side •  Current median latency to play any track is 265ms •  On average, the human notion of “instant” is anything under 200ms •  Due to disk lookup, at times it's actually faster to start playing a track from network than from disk •  Below 1% of playbacks experienced stutter
  • 7. “Spotify is fast due to P2P” •  This is something I read a lot around the web •  P2P does play a crucial role in the picture, but… •  Experience at Spotify showed me that most latency issues are directly linked to backend problems •  It’s a mistake to think that we could be this fast without a smart and scalable backend architecture So let’s give credit where credit is due.
  • 8. Going web scale!!1 “Scaling Twitter” Blaine Cook, 2007 http://www.slideshare.net/Blaine/scaling-twitter
  • 9. Handling growth Things to keep in mind: •  Scaling is not an exact science •  There is no such thing as a magic formula •  Usage patterns differ •  There is always a limit to what you can handle •  Fail gracefully •  Continuous evolution process
  • 10. Scaling horizontally •  You can always add more machines! •  Stateless services •  Several processes can share memcached •  Possible to run in “the cloud” (EC2, Rackspace) •  Need some kind of load balancer •  Data sharing/synchronization can be hard •  Complexity: many pieces, maybe hidden SPOFs •  Fundamental to the application’s design
  • 11. Usage patterns Typically, some services are more demanding than others, this can be due to: •  Higher popularity •  Higher complexity •  Low latency expectation •  All combined
  • 12. Decoupling •  Divide and conquer! •  The Unix way •  Resources assigned individually •  Using the right tools to address each problem •  Organization and delegation •  Problems are isolated •  Easier to handle growth
  • 13. Read only services •  The easiest to scale •  Stateless •  Use indices, large read-optimized data containers •  Each node has its local copy •  Data structured according to service •  Updated periodically, during off-peak hours •  Take advantage of OS page cache
  • 14. Read-write services •  User generated content, e.g. playlists •  Hard to ensure consistence of data across instances Solutions: •  Eventual consistency: •  Reads of just written data not guaranteed to be up-to-date •  Locking, atomic operations •  Creating globally unique keys, e.g. usernames •  Transactions, e.g. billing
  • 16. Finding a service via DNS Each service has an SRV DNS record: •  One record with same name for each service instance •  Clients (AP) resolve to find servers providing that service •  Lowest priority record is chosen with weighted shuffle •  Clients retry other instances in case of failures Example SRV record _frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.! name TTL type prio weight port host!
  • 17. Request assignment •  Hardware load balancers •  Round-robin DNS •  Proxy servers •  Sharding: •  Each server/instance responsible for subset of data •  Directs client to instance that has its data •  Easy if nothing is shared •  Hard if you require replication
  • 18. Sharding using a DHT Some Spotify services use Dynamo inspired DHTs[1]: •  Each request has a key •  Each service node is responsible for a range of hash keys •  Data is distributed among service nodes •  Redundancy is ensured by re-hashing and writing to replica node •  Data must be transitioned when ring changes ! [1] http://dl.acm.org/citation.cfm?id=1294281
  • 20. Spotify’s DNS powered DHT Configuration of DHT config._frobnicator._http.example.com. 3600 TXT “slaves=0”! config.srv_name. TTL type ! no replication! ! config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”! config.srv_name. TTL! type ! three replicas! on separate hosts! Ring segment, one per node tokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”! tokens.port.host. TTL type last key! !
  • 21. And if none of this works for you Remember /dev/null is web scale!! http://www.xtranormal.com/watch/6995033/
  • 22. Questions? get in touch! @ricardovice ricardo@spotify.com
  • 23. Thank you. @ricardovice ricardo@spotify.com