High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc.
Upcoming SlideShare
Loading in...5
×
 

High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc.

on

  • 5,904 views

Skalierbarkeit bedeutet hohes Aufkommen von Traffic, Daten, Userbase, IO, Parallelverarbeitung und Concurrency, aber wie funktioniert dies bei den bekannten Web 2.0 Plattformen. Wie wird skaliert – ...

Skalierbarkeit bedeutet hohes Aufkommen von Traffic, Daten, Userbase, IO, Parallelverarbeitung und Concurrency, aber wie funktioniert dies bei den bekannten Web 2.0 Plattformen. Wie wird skaliert – horizontal oder vertikal, im Client-Layer, Service-Layer oder im Backend-Layer? Welche Rolle spielt Caching, NoSQL, Clustering und MapReduce bei der Skalierbarkeit? Wie wirkt sich die Skalierbarkeit in Sachen Konsistenz vs. Verfügbarkeit vs. Network Toleranz aus? Der Vortrag geht vergleichend auf verschiedene Konzepte von Skalierbarkeit ein und erläutert anhand von Beispielen wie mit pragmatischen Mitteln eine skalierbare Architektur erreicht werden kann.

Statistics

Views

Total Views
5,904
Views on SlideShare
5,901
Embed Views
3

Actions

Likes
3
Downloads
154
Comments
1

3 Embeds 3

http://www.wordever.com 1
http://pinterest.com 1
http://www.kred.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc. High Scalability by Example – How can Web-Architecture scale like Facebook, Twitter, YouTube, etc. Document Transcript

  • High Scalability by Example – How can Web Architecture scale like Facebook, Twitter? JAX 2011, 03.05.2011 Robert Mederer, Markus Bukowski Copyright © 2011 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture.Speaker Robert Mederer Markus Bukowski Lead Technology Architect Senior Software Engineer Anni-Albers-Straße 11 Kaistraße 20 40221 Düsseldorf 80807 München Mobil: +49-175-57-64217 Mobil: +49-175-57-68012 markus.bukowski@accenture.com robert.mederer@accenture.comExperience2000 - 2005: Technology Architect and Software 2006 - 2009: Software Engineer in several projectsEngineer in several projects (during studies) for mainly telecommunication companies2006: Technical Architecture Lead, Integration andExecution Architecture for Location-Based Service 2009 - 2011: Senior Software Engineer in severalProvider projects2009: Technical Architecture Lead, Frontend and 2009/2010: Coach and Software Engineer for aExecution Architecture Government Agency Health Insurance Fund2009/2010: Technical Architecture and front-office 2011: Technical Architecture Lead during theintegration build lead, Integration and Execution development of a Document Management SolutionArchitecture Financial Services Agency for a Government Agency2011: Architect and Performance Engineer forLocation Based Services PlatformCopyright © 2010 Accenture. All rights reserved. 2
  • AccentureHigh performance achievedCompany Profile Worldwide Revenues $21.6 billion•  Global management consulting, (in US$ billion, as of August 31, 2010) technology services and outsourcing Communications & company Resources High Tech•  215.000 employees•  Rank 47 among the 3.88 4.83 “Best Global Brands 2008”•  Top 100 Employer 4.32•  28 of the DAX-30-Companies 2.98•  96 of the Fortune-Global-100 Public•  More than three-quarters of the Fortune- Financial 5.53 Service Services Global-500•  87 of our Top 100-clients have been with Products us for 10 or more yearsCopyright © 2010 Accenture. All rights reserved. 3Local Accenture … ???Geographic unit•  Austria•  Switzerland•  GermanyEmployees Berlin•  Ca. 6000 Düsseldorf•  We are hiring!Exciting Technology work Frankfurt•  Large scale projects (100+ people / multiple years) Munich•  Most challenging requirements –  Stock Exchange / Banking / Trading Systems Vienna –  AEMS Mobility Platform –  Large Scale Web Applications Zurich (> 1M page views / day) –  Batch ArchitecturesCopyright © 2010 Accenture. All rights reserved. 4
  • Agenda•  Introduction – Scalability & CAP Theorem – Traditional websites & Social media websites – High Scalability in Numbers•  Accenture – High Scalability by Example•  Architecture at Internet Scale•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 5Scalability A system’s capacity to uphold the same performance under heavier volumes.(Patterns for Performance and Operability: Building and Testing Enterprise Software, Chris Ford et. al., 2008)Copyright © 2011 Accenture. All rights reserved. 6
  • Vertical Scalability•  Is achieved by increasing the capacity of a single node – CPU, – Memory, – Bandwidth, …•  Simple Process – Application is generally not affected by those changes•  Classical Example are Super Computers like – HP Integrity Superdome – IBM Mainframe Source: Hewlett-PackardCopyright © 2011 Accenture. All rights reserved. 7Horizontal Scalability•  Application is spread on a cluster with several nodes•  Nodes can be added to scale out•  Produces overhead – Keep cluster consistent – Node error detection and handling – Communication between nodes•  May be used to increase reliability and availability•  Distributed Systems and Programs like – SETI@Home Source: Space Sciences Laboratory, U.C. Berkeley – World Wide Web – Domain Name ServiceCopyright © 2011 Accenture. All rights reserved. 8
  • CAP Theorem (Brewer‘s Theorem) •  Consistency – all clients see the same data at the same time Consistency •  Availability – all clients can find all data even in presence of failure •  Partition Tolerance – system works even when one node failed Partition Availability Tolerance ImpossibleSource: PODC-keynote, Towards Robust Distributed Systems, Dr. Eric A. Brewer, 2000Copyright © 2011 Accenture. All rights reserved. 9CAP TheoremNormally, two of these properties for any shared-data system Consistency + Availability C •  High data integrity P A •  Single site, cluster database, LDAP, etc. •  2-phase commit, data replication, etc. C Consistency + Partition •  Distributed database, distributed locking, etc. P A •  Pessimistic locking, etc. Availability + Partition C •  High scalability P A •  Distributed cache, DNS, etc. •  Optimistic locking, expiration/leases (timeout), etc.Source: “Architecting Cloudy Applications”, David ChouCopyright © 2011 Accenture. All rights reserved. 10
  • Agenda•  Introduction – Scalability & CAP Theorem – Traditional websites & Social media websites – High Scalability in Numbers•  Accenture – High Scalability by Example•  Architecture at Internet Scale•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 11Traditional Websites(Scaling the Social Graph: Infrastructure at Facebook, Jason Sobel, QCon 2011)Copyright © 2011 Accenture. All rights reserved. 12
  • Social Media Websites(Scaling the Social Graph: Infrastructure at Facebook, Jason Sobel, QCon 2011)Copyright © 2011 Accenture. All rights reserved. 13Agenda•  Introduction – Scalability & CAP Theorem – Traditional websites & Social media websites – High Scalability in Numbers•  Accenture – High Scalability by Example•  Architecture at Internet Scale•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 14
  • High Scalability in NumbersThe World Is Obsessed With Facebook (Video)Source: http://www.youtube.com/watch?v=xJXOavGwAW8Copyright © 2011 Accenture. All rights reserved. 15High Scalability in Numbers facebook twitter Active users 500.000.000 New accounts per day 500.000 Active users logging in 250.000.000 Tweets per day 140.000.000 daily (a year ago) (50.000.000) Active mobile users 250.000.000 Max tweets per second 6.939 In 20 minutes … Links shared 1.000.000 Status updates 1.851.000 Photos uploaded 2.716.000Source: http://www.facebook.com/press/info.php?statistics, http://www.allfacebook.com/ Comments made 10.208.000 16Copyright © 2011 Accenture. All rights reserved.
  • High Scalability in Numbers LinkedIn Members >100.000.000 Connections between Members 1.300.000.000 Visitors per month 128.000.000 % of global internet users who visit LinkedIn daily 4%Source: http://blog.linkedin.com, http://press.linkedin.com/about/, LinkedIn Demographics 20111Copyright © 2011 Accenture. All rights reserved. 17Agenda•  Introduction•  Accenture – High Scalability by Example – The Royal Wedding Website – US Based Professional Sport League .com Website•  Accenture – High Scalability by Example•  Architecture at Internet Scale•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 18
  • Agenda•  Introduction•  Accenture – High Scalability by Example•  Architecture at Internet Scale – Facebook – Twitter – LinkedIn – Challenges•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 19Facebook – Key Facts•  Biggest Social Network•  People are linked together•  People share and exchange information in real-time•  Facebook provides a – A personalized News Feed containing news about friends – Picture Service including people tagging / linking – Messaging Service to stay in connect with friends – Platform to play Games with friendsCopyright © 2011 Accenture. All rights reserved. 20
  • Facebook – Architecture (1)•  Facebook „main“ architecture is based Memcache on a LAMP stack•  The application logic resides in Web Server layer (PHP based)•  The application layer takes care of the data distribution•  Some services do not fit (C++, Java) – Search MySQL – Ads – People You May Know Invalidation logic is implemented in – Multifeed the application!Copyright © 2011 Accenture. All rights reserved. 21Facebook – Architecture (2)Multifeed / Newsfeed•  Distributed System•  Every user gets – 45 best rated updates –  on every reload•  Every DB-Update results in a Scribe Notification –  Leaf Nodes / Server are notifiedSource: “High Performance at Massive Scale – Lessons learned at Facebook”, Jeff Rothschild, 2009Inspired By: “Architecting Cloudy Applications”, David Chou, 2010Copyright © 2011 Accenture. All rights reserved. 22
  • Agenda•  Introduction•  Accenture – High Scalability by Example•  Architecture at Internet Scale – Facebook – Twitter – LinkedIn – Challenges•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 23Twitter – Key Facts•  Twitter is a real-time information network•  Twitter‘s key characteristics – Tweets are used for sharing information – Timeline of tweets must be kept – A Social Graph is maintained to deliver informationCopyright © 2011 Accenture. All rights reserved. 24
  • Twitter - Architecture Load BalancersWebFrontend Apache mod_proxyApps & Distributed Rails (Unicorn) CacheServices Flock memcached Kestrel QueuesPartitionedData Distributed MySQL Cassandra Storage DaemonsSource: “Architecting Cloudy Applications”, David ChouCopyright © 2011 Accenture. All rights reserved. 25Agenda•  Introduction•  Accenture – High Scalability by Example•  Architecture at Internet Scale – Facebook – Twitter – LinkedIn – Challenges•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 26
  • LinkedIn – Key Facts •  A social media networking platform to find connections to recommended job candidates, industry experts and business partners •  99% pure Java LinkedIn – Applications •  Groups •  Job market •  Mailing •  Profiles (Friends and Companies) •  Mobile Applications (Android, iPhone, Blackberry, Palm) Copyright © 2011 Accenture. All rights reserved. 27 LinkedIn - ArchitectureWeb Frontend LinkedIn Spring Search Grails MVC Queuing Spring Caching / Messaging DWR Spring MVC Grails DWR Spring Framework (IoC, Web Lucene Searchprocessing AOP, Model-View-Controller DI, for Asynchronous Engine •  Direct Web Remoting (AJAX for Voldemort Open-source web application Frameworktriggered writes •  Java Library) Concerns,..) by Separation of open "coding Aims to bring the source in •  Apache combines high- •  Voldemort userApps & Services •  RPC library toparadigm to the text Used performance, full-featured LinkedIn extensions with Groovy memory caching for convention" call Java functions search engine library written storage system so that a Service Service LinkedIn Spring Service Service from JavaScript and (lin:bean, •  XML Vocabulary to call StorageentirelyLoad of Page not Initial •  Why Grails? incaching tier is Valdemort Hadoop separate Java Storagerequired a more from Javaweb Sensei JavaScript functions productive •  lin:component, …) Needed software Hadoop isOperations framework Pull a •  ZoieProperty Editors a major Valdemort used as •  Forframework •  Used supports data-intensive distributed database distributed key-value data cache Lucene Zoie Bobo Caching Voldemort ehcache Memcache •  that realtime indexing/search ehcache •  … •  New kind of application context Search Index •  Asynchronous Communication for• the social graph to (map/ horizontallyprototyping Rapid •  distributed applications source, system scaledopen •  for life-cycle management with Ehcache is an •  Push Operations with and demonstrate feasibility •  reduce /Consistency cachehigh standards-based relaxed distributed file system) used Queuing / Messaging Bobo productivity gain Advantage: components durabilityis usedwith engine offload •  … to boost performance, guarantees majorjava/ •  Hadoop database and simplify the Integration as a faceted search existing for ActiveMQ the concurrent read Fast Socialkey-value data Graph distributed index building store •  spring-based solutions scalability Fast •  handles semi-structured and •  for the social graph Memcache structured dataStorage •  Bottleneck: •  Distributed Cache MySQL Sensei Hadoop Voldemort •  Read data •  Index building• Keep core data ACID• Keep replicated and cached data BASE• Cache data on a cheap memory (memcached)• Use a hint to route the client to his / her’s ACID data Source: Project Voldemort, Jay Kreps Source: http://sna-projects.com Copyright © 2011 Accenture. All rights reserved. 28
  • Agenda•  Introduction•  Accenture – High Scalability by Example•  Architecture at Internet Scale•  Common concepts of Scalability•  ConclusionCopyright © 2011 Accenture. All rights reserved. 29Common concepts of Scalability parallelization asynchronous idempotent 7 Habits of operations Good partitioned data Distributed fault-tolerance Systems optimistic concurrency shared nothingSource: “Architecting Cloudy Applications”, David ChouSource: highscalability.comCopyright © 2011 Accenture. All rights reserved. 30
  • Common concepts of Scalability Hybrid architectures• Scale-out (horizontal) • Scale-up (vertical) – BASE – ACID – focus on “commit” – availability first – optimistic locking – pessimistic locking – shared nothing – transactional – maximize scalability – favor accuracy/consistency Most distributed systems realize bothCopyright © 2011 Accenture. All rights reserved. 31Agenda•  Introduction•  Accenture – High Scalability by Example•  Architecture at Internet Scale•  Common concepts of Scalability•  Conclusion –  Questionnaire –  Déjà vu - Common concepts –  Don’t forget the InfrastructureCopyright © 2011 Accenture. All rights reserved. 32
  • Conclusion Questionnaire•  Is there a need to scale my application? –  Vertical scaling is more easy to achieve –  Use horizontal scaling only when required•  Is there a plan to proof your designed solution? –  Plan to do a lot of realistic Proof-of-Concepts•  Is there a one size fits all solution? –  NO!•  How important is ACID? – Is BASE enough? – Can a NoSQL solution be used?Copyright © 2011 Accenture. All rights reserved. 33Conclusion Déjà vu - Common concepts•  Asynchronous Processing – Keep Facebook, Twitter and LinkedIn in mind –  Asynchronous writes and even UI•  Data Partioining –  You must understand your data –  Is a hybrid solution as used by LinkedIn applicable?•  Design To Failure –  Lessons learned from Amazon‘s cloud solution outageCopyright © 2011 Accenture. All rights reserved. 34
  • Conclusion Don’t forget the Infrastructure•  Disk IO compared to RAM is slow   Avoid „slow“ IO access whenever possible   Caching, Caching, Caching   Bulk Updates (Batch) over small updates•  Network Latency   Keep in mind that there is a network   Where are the users located?   How are your users connected? mobile application?   Is there a need for a datacenter distribution?•  Hardware configuration –  Is the CPU Power Saving working right?Copyright © 2011 Accenture. All rights reserved. 35Questions ?Copyright © 2011 Accenture. All rights reserved. 36