• Save
Distributed "Web Scale" Systems
Upcoming SlideShare
Loading in...5
×
 

Distributed "Web Scale" Systems

on

  • 4,430 views

Slides from my keynotes at ReaktorDevDay 2011 & Codebits 2011

Slides from my keynotes at ReaktorDevDay 2011 & Codebits 2011

Statistics

Views

Total Views
4,430
Views on SlideShare
4,347
Embed Views
83

Actions

Likes
11
Downloads
1
Comments
0

7 Embeds 83

http://help.addsite.nl 36
https://twitter.com 22
http://a0.twimg.com 9
http://www.linkedin.com 6
http://us-w1.rockmelt.com 5
http://www.onlydoo.com 4
https://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Distributed "Web Scale" Systems Distributed "Web Scale" Systems Presentation Transcript

  • Distributed “Web Scale” Systems Ricardo Vice Santos @ricardovice
  • Who am I?•  I’m Ricardo!•  Lead Engineer at Spotify•  ricardovice on twitter, spotify, about.me, kiva, slideshare, github, bitbucket, delicious…•  Portuguese•  Previously working in the video streaming industry•  (only) Discovered Spotify late 2009•  Joined in 2010
  • spotifiera: to use Spotify;spo·ti·fie·ra Verb to provide a service free of cost;
  • What’s Spotify all about?•  A big catalogue, tons of music•  Available everywhere•  Great user experience•  More convenient than piracy•  Reliable, high availability•  Scalable for many, many usersBut what really got me hooked up:•  Free, legal ad-supported service•  Very fast
  • The importance of being fast•  High latency can be a problem, not only in First Person Shooters•  Slow performance is a major user experience killer•  At Velocity 2009, Eric Schurman (Bing) and Jake Brutlag (Google Search) showed that increased latency directly hurt usage and revenue per user[1].•  Latency leads to users leaving, many wont ever come back•  Users will share their experience with friends [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
  • So how fast is Spotify?•  We monitor playback latency on the client side•  Current median latency to play any track is 265ms•  On average, the human notion of “instant” is anything under 200ms•  Due to disk lookup, at times its actually faster to start playing a track from network than from disk•  Below 1% of playbacks experienced stutter
  • “Spotify is fast due to P2P”•  This is something I read a lot around the web•  P2P does play a crucial role in the picture, but…•  Experience at Spotify showed me that most latency issues are directly linked to backend problems•  It’s a mistake to think that we could be this fast without a smart and scalable backend architectureSo let’s give credit where credit is due.
  • Going web scale!!1“Scaling Twitter”Blaine Cook, 2007http://www.slideshare.net/Blaine/scaling-twitter
  • Handling growthThings to keep in mind:•  Scaling is not an exact science•  There is no such thing as a magic formula•  Usage patterns differ•  There is always a limit to what you can handle•  Fail gracefully•  Continuous evolution process
  • Scaling horizontally•  You can always add more machines!•  Stateless services•  Several processes can share memcached•  Possible to run in “the cloud” (EC2, Rackspace)•  Need some kind of load balancer•  Data sharing/synchronization can be hard•  Complexity: many pieces, maybe hidden SPOFs•  Fundamental to the application’s design
  • Usage patternsTypically, some services are more demanding thanothers, this can be due to:•  Higher popularity•  Higher complexity•  Low latency expectation•  All combined
  • Decoupling•  Divide and conquer!•  The Unix way•  Resources assigned individually•  Using the right tools to address each problem•  Organization and delegation•  Problems are isolated•  Easier to handle growth
  • Read only services•  The easiest to scale•  Stateless•  Use indices, large read-optimized data containers•  Each node has its local copy•  Data structured according to service•  Updated periodically, during off-peak hours•  Take advantage of OS page cache
  • Read-write services•  User generated content, e.g. playlists•  Hard to ensure consistence of data across instancesSolutions:•  Eventual consistency: •  Reads of just written data not guaranteed to be up-to-date•  Locking, atomic operations •  Creating globally unique keys, e.g. usernames •  Transactions, e.g. billing
  • Decoupling at Spotify
  • Finding a service via DNSEach service has an SRV DNS record:•  One record with same name for each service instance•  Clients (AP) resolve to find servers providing that service•  Lowest priority record is chosen with weighted shuffle•  Clients retry other instances in case of failuresExample SRV record_frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.! name TTL type prio weight port host!
  • Request assignment•  Hardware load balancers•  Round-robin DNS•  Proxy servers•  Sharding: •  Each server/instance responsible for subset of data •  Directs client to instance that has its data •  Easy if nothing is shared •  Hard if you require replication
  • Sharding using a DHTSome Spotify services use Dynamo inspired DHTs[1]:•  Each request has a key•  Each service node is responsible for a range of hash keys•  Data is distributed among service nodes•  Redundancy is ensured by re-hashing and writing to replica node•  Data must be transitioned when ring changes! [1] http://dl.acm.org/citation.cfm?id=1294281
  • DHT example
  • Spotify’s DNS powered DHTConfiguration of DHTconfig._frobnicator._http.example.com. 3600 TXT “slaves=0”! config.srv_name. TTL type ! no replication!!config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”! config.srv_name. TTL! type ! three replicas! on separate hosts!Ring segment, one per nodetokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”! tokens.port.host. TTL type last key!!
  • And if none of this works for youRemember/dev/null isweb scale!! http://www.xtranormal.com/watch/6995033/
  • Questions? get in touch! @ricardovice ricardo@spotify.com
  • Thank you. @ricardovice ricardo@spotify.com