SlideShare a Scribd company logo
1 of 61
Download to read offline
Scalable
Internet
Architectures


         / Operating at Scale



                                1
Who am I? @postwait on twitter


       Author of “Scalable Internet Architectures”
       Pearson, ISBN: 067232699X

       CEO of OmniTI
       We build scalable and secure web applications

       I am an Engineer
       A practitioner of academic computing.
       IEEE member and Senior ACM member.
       On the Editorial Board of ACM’s Queue magazine.

       I work on/with a lot of Open Source software:
       Apache, perl, Linux, Solaris, PostgreSQL,
       Varnish, Spread, Reconnoiter, etc.

       I have experience.
       I’ve had the unique opportunity to watch a great many catastrophes.
       I enjoy immersing myself in the pathology of architecture failures.




                                                                             2
Topic Progression



       What is an architecture?

          What does it mean to run a (scalable) architecture?

       Scalability Patterns for

          Dynamic Content

          Databases

          Complex Systems

       Scaling Techniques

       Bad Ideas




                                                                3
Architecture


         / the whole enchilada



                                 4
Architecture / what it is




       architecture (n.):
       the complex or carefully designed structure of something.

       specifically in computing:
       the conceptual structure and logical organization of a computer or a
       computer-based system.




                                               - Oxford American Dictionary




                                                                              5
Architecture / more than meets the eye



       An architecture is all encompassing.

          space, power, cooling

          servers, switches, routers

          load balancers, firewalls

          databases, non-database storage

          dynamic applications

          the architecture you export to the user (javascript, etc.)




                                                                       6
Architecture / awareness is key



         Not all people do all things.

         However...

            lack of awareness of the other disciplines is bad

            leads to isolated decisions

            which leads to unreasonable requirements elsewhere

            which lead to over engineered products

            stupid decisions

            catastrophic failures




                                                                 7
Architecture / running it all




       Running Operations is serious stuff

       It takes knowledge, tools...

       but that is not enough.

       It takes experience.

       And perhaps even more importantly...

       It takes discipline.




                                              8
Architecture / knowledge



       Read.

       Study.

       Leverage User Groups (SAGE,LUGs,OSUGs,PUGs,etc.)

       Participate in the community.

       Go to conferences:

          Velocity (now, you’re here - good job!)

          Structure (now, you’re not here - sorry)

          Surge (Baltimore, Sept 30 - Oct 1)




                                                          9
Architecture / tools




       Collaborate with colleagues.

       Try new tools.

       Write new tools.

       Know and practice your tools during the “good times”
       in order to make their use effortless during the “bad times”




                                                                      10
Architecture / tool theories



      “One only needs two tools in life:
       WD-40 to make things go,
       and duct tape to make them stop.”
                                                        - George Weilacher



      “Men have become the tools of their tools.”
                                                    - Henry David Thoreau



      “All the tools and engines on earth
       are only extensions of man's limbs and senses.”
                                                      - Ralph Waldo Emerson




                                                                              11
Architecture / my take on tools




       Tools are just tools.

       They are absolutely essential to doing your job.

       They will never do your job for you.

       Tools will never replace experience and discipline.

       But tools can help you maintain discipline.




                                                             12
Architecture / experience




      “Good judgment comes from experience.
       Experience comes from bad judgment.”
                                                    - Proverb



      “Judge people on the poise and integrity
       with which they remediate their failures.”
                                                         - me




                                                                13
Architecture / discipline


       Discipline is important in any job.

       Discipline is

       “controlled behavior resulting from training, study and practice.”

       In my experience discipline is the most frequently missing ingredient
       in the field of web operations.

       I believe this to be caused by a lack of focus, laziness, and the view
       that it is a job instead of an art.

       As in any trade

          To be truly excellent one must treat it as a craft.
          One must become a craftsman.
          Through experience learn discipline.
          And through practice achieve excellence.




                                                                                14
Architecture / actually running it all




       Okay, I get it.

       From day to day, what do I need to know?




                                                  15
Architecture / version control


       Switch configurations should be in version control.

       Router configurations should be in version control.

       Firewall configurations should be in version control.

       System configurations should be in version control.

       Application configurations should be in version control.

       Monitoring configurations should be in version control.

       Documentation should be in version control.

       Application code should be in version control.

       Database schema should be in version control.

       Everything you do should be in version control.




                                                                  16
Architecture / version control



      And no... it doesn’t matter which tool.

      It’s not about the tool, it’s about the discipline to always use it.




                               (today, we use subversion)




                                                                             17
Architecture / know your systems


       To know when something looks unhealthy,
       one must know what healthy looks like.

       Monitor everything.

       Collect as much system and process information as possible.

       Look at your systems and use your diagnostic tools
       when things are healthy.




                                                                     18
Architecture / management


         Package roll out?

         Machine management?

         Provisioning?



      They tell me I should use Puppet.

      They tell me I should use Chef.

      well... I stick to my theory on tools:

         A master craftsman chooses or builds the tools he likes.

         A tool does not the master craftsman make.




                                                                    19
Dynamic Content


        / keeping users interested



                                     20
Techniques / Dynamic Content




     “We should forget about small efficiencies,
      say about 97% of the time:
      premature optimization is the root of all evil.”
                                                           - Donald Knuth



     “Knowing when optimization is premature defines the difference
      between the master engineer and the apprentice.”
                                                                      - me




                                                                             21
Techniques / optimization



       Optimization comes down to a simple concept:
       “don’t do work you don’t have to.”

       It can take the form of:

          computational reuse

          caching in a more general sense



          and my personal favorite:

             ... avoid the problem, and do no work at all.




                                                             22
Techniques / optimization applied




     Optimization in dynamic content simply means:

       Don’t pay to generate the same content twice

       Only generate content when things change

       Break the system into components so that you can isolate the costs of
       things that change rapidly from those that change infrequently.

     There is a simple truth:

       your content isn’t as dynamic as you think it is




                                                                               23
Techniques / optimization applied


     Javascript, CSS and images are only referentially linked

     They should all be consolidated and optimized.

     They should be publicly cacheable and expire 10 years from now.

     RewriteRule (.*).([0-9]+).css $1.css

          Means that /s/app.23412.css is just /s/app.css

          different URL means new cached copy

          any time the CSS is changed, just bump the number the application
          references from HTML.

          Same applies for Javascript.

     Images... you should just deploy a new one at a new URI.




                                                                              24
Techniques / per user info



      If you could have a distributed database that:

           when a node fails, you can guarantee no one needs the info on it

           it is always located near the user accessing it

           it can easily grow as your user base grows

      Introducing CookieDB:

           it’s been here all along

           it’s up in your browser

           use it




                                                                              25
Techniques / data caching




     Asking hard questions of database can be “expensive”

     You have two options:

          cache the results

             best when you can’t afford to be accurate

          materialize a view on the changes

             best when you need to be accurate




                                                            26
Techniques / choosing technologies



     Understand how you will be writing data into the system.

     Understand how you will be retrieving data from the system.

     WAIT... don’t stop.

     Understand how everyone else in your organization will be retrieving
     data from the system.

     Research technologies and attempt find a good fit for your requirements:
     data access patterns, consistency, availability, recoverability,
     performance, stability

     This is not as easy as it sounds. It requires true technology agnosticism.




                                                                                  27
Data Management


       / remembering something useful



                                        28
Techniques / Databases




      Rule 1: shard your database

      Rule 2: shoot yourself




                                    29
Databases / second try

       Horizontally scaling your databases via sharding/federating requires
       that you make concessions that should make you cry.

       shard (n.)
       a piece of broken ceramic, metal, glass, or rock typically having
       sharp edges.

       sharding (v.)
       dunno... but you will likely wound yourself and you get to keep all
       the pieces.

       But seriously...

          databases (other than MySQL) scale vertically to a greater degree
          than many people admit.

          if you must fragment your data, you will throw away relational
          constraints. this should make you cry. cry. cry hard. cry some
          more. then move on and shard your database.




                                                                              30
Databases / vertical scaling


       Many times relational constraints are not needed on data.

       If this is the case, a traditional relational database is unnecessary.

       There are cool technologies out there to do this:

          “files”

          “noSQL”

          cookies

       Non-ACID databases can be easier to scale

       Vertical scaling is achieved via two mechanisms:

          doing only what is absolutely necessary in the database

          running a good database that can scale well vertically




                                                                                31
Databases / horizontal scaling




       Okay... so you really need to scale horizontally.

       understand the questions you intend to ask.

       make sure that you partition in a fashion that doesn’t require more
       than a single shard to answer OLTP-style questions.

       If that is not possible, consider data duplication.




                                                                             32
Databases / an example



      private messages all stored on the server side

         individuals sends messages to their friends

         an individual should see all messages sent to them



      Easy! partition by recipient.

         either by hash

         range partitions

         whatever




                                                              33
Databases / an example complicated



       now users must be able to review all sent messages.



       Crap!

          our recipient-based partitioning causes us to map the request
          across all shards to answer messages by sender.

       In this case:

          store messages twice... once by recipient and once by sender

          twice the storage, but queries only hit a single node now




                                                                          34
Databases / an example unwound



      Partitioning data allows one to reduce the dataset size on each node.

      You might just cause more problems than you’ve solved.

      Complicated (or even simple) queries become a pain
      if they don’t align with your partitioning strategy.

      Partitioning like this is really a commitment. You lose much of the
      power of your relational database and complicate what were once easy
      problems.

      Sometimes you have to do what you have to do.
      Don’t make the concession until you have to.




                                                                              35
Databases / take care




       Multi-master replication (in ACID databases) is simply not ready
       these days.

          getting closer every year.

       When partitioning/federating/sharding data, take the step to model
       what you are doing.

       Prototype several different schemes and make sure you truly
       understand your intended use patterns before deciding.




                                                                            36
Databases / Stepping outside of ACID

       There are some alternatives to traditional RDBMS systems.

       Key-Value stores and document databases offer interesting
       alternatives.

       Without an imposed relational model federating/sharding is much
       easier to bake in.

       By relaxing consistency requirements, one can increase availability by
       adopting a paradigm of eventual consistency.

          MongoDB

          Cassandra

          Voldemort

          Redis

          CouchDB




                                                                                37
Databases / noSQL


      noSQL systems aren't a cure-all data storage paradigm.

      A lot of data has relationships that are important.

      Referential integrity is quite important in many situations.

      A lot of datasets do not need to scale past a single instance.

      "Vertical scaling is not a strategy" is a faulty argument.

      Not every component of the architecture needs to scale past the limits
      of vertical scaling.

      If you can segregate your components, you can adhere to a right tool
      for the job paradigm. Use SQL where it is the best tool for the job and
      use distributed key-value stores and document databases in stations
      where they shine.




                                                                                38
Databases / when



      break the problems down into small pieces and decouple them

      determine how large the problem is and can grow

      fit the solution to the problem

       avoid: “shiny is good”

       avoid: “over engineering”

       embrace: “K.I.S.S.”

       embrace: “good is good”




                                                                    39
Databases / reality or “unpopular opinion”




       noSQL is the solution to today’s Web 2.0 problems: not really

       traditional RDBMS patterns will take you to finish line: nope

       I can just replace my DBMS with a key-value store: not exactly

        you must map your RPO and RTO and ACID requirements

        good luck (again: break down the problems)




                                                                        40
Service Decoupling


        / controlling experience by removing ‘the suck’



                                                      41
Techniques / Service Decoupling



       One of the most fundamental techniques for building scalable systems

       Asynchrony...

         Why do now what you can postpone until later?

           This mantra often doesn’t break a user’s experience.

       Break down the user transaction into parts.

       Isolate those that could occur asynchronously.

       Queue the information needed to complete the task.

       Process the queues “behind the scenes.”




                                                                              42
Techniques / Service Decoupling


       Asynchrony... that’s not really what it means.

        It isn’t exactly about postponing work (though that can happen).

        It is about service isolation.

       By breaking the system in to small parts we gain:

        problem simplification,

        fault isolation,

        decoupling of approach, strategy and tactics,

        simplified design,

        models for performance that are more likely to be accurate, and

        simplified overall capacity planning.




                                                                           43
Decoupling / concept


       If I don’t want to do something now...

       I must tell someone to do it later.



       This is “messaging”



       There are a lot of solutions:

          JMS (Java message service)

          Spread (extended virtual synchrony messaging bus)

          AMQP (advanced message queueing protocol)




                                                              44
Decoupling / tools



       Message Queueing is a critical tool in the stack...
       durable message queueing:

          ActiveMQ (Java)

          OpenAMQ (C)

          RabbitMQ (erlang)



       Most common protocol is STOMP

          STOMP kinda sucks... but it is universal

          Clients exist for every language




                                                             45
Decoupling / control




     “Moderation in all things, including moderation.”
                                                         - Titus Petronius
                                                                 AD 27-66




                                                                             46
Design & Implementation
Techniques


        / some say architecture != implementation



                                                    47
Architecture vs. Implementation




       Architecture is without specification of the vendor,
       make model of components.

       Implementation is the adaptation of an architecture
       to embrace available technologies.

       They are intrinsically tied.
       Insisting on separation is a metaphysical argument
       (with no winners)




                                                              48
Respect Engineering Math



      Engineering math:

           19 + 89 = 110

      “Precise Math”:

           19 + 89 = 10.8




                           Ok. Ok. I must have, I must have put a decimal
                           point in the wrong place or something. Shit. I
                           always do that. I always mess up some mundane
                           detail.

                                             - Michael Bolton in Office Space



                                                                                49
Insure the gods aren’t angry.

      Bob: We need to grow our cluster of web servers.

      Alice: How many requests per second do they do, how many do you
      have and what is there current resource utilization?

      Bob: About 200 req/second, 8 servers and they have no headroom.

      Alice: How many req/second do you need?

      Bob: 800 req/second would be good.

      Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on your
      goal.

      Bob: No... 200 / 8 = 25 req/second per server.

      Alice: Bob... the gods are angry.




                                                                        50
Why you’ve pissed of the gods.



      Most web apps are CPU bound (as I/O happens on a different layer)

      Typical box today:
       8 cores are 2.8GHz or
       22.4 BILLION instructions per second.

      22x109 instr/s / 25 req/s = 880 MILLION instructions per request.

      This same effort (per-request) provided me with approximately
      15 minutes enjoying “Might & Magic 2” on my Apple IIe
      - you’ve certainly pissed me off.

      No wonder the gods are angry.




                                                                          51
Develop a model


     Queue theoretic models are for “other people.”

     Sorta, not really.

     Problems:

         very hard to develop a complete and accurate model for solving

     Benefits:

         provides insight on architecture capacitance dependencies

         relatively easy to understand

         illustrates opportunities to further isolate work




                                                                          52
Rationalize your model

        Draw your model out

        Take measurements and walk through the model to rationalize it
        i.e. prove it to be empirically correct

        You should be able to map actions to consequences:

        A user signs up ➙
          4 synchronous DB inserts (1 synch IOPS + 4 asynch writes)
          1 AMQP durable, persistent message
          1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes

        In a dev environment, simulate traffic and rationalize you model

        I call this a “data flow causality map”




                                                                           53
Complexity will eat your lunch


        there will always be empirical variance from your model

        explaining the phantoms leads to enlightenment

        service decoupling in complex systems give:

         simplified modeling and capacity planning

         slight inefficiencies

         promotes lower contention

         requires design of systems with less coherency requirements

         each isolated service is simpler and safer

         SCALES.




                                                                       54
WTF


      / most scalability problems are due to idiocy



                                                      55
WTF / don’t be an idiot



       most acute scalability disasters are due to idiots

       don’t be an idiot

       scaling is hard

       performance is easier

       extremely high-performance systems tend to be easier to scale

          because they don’t have to

                                       SCALE
                                                            as much.




                                                                       56
WTF / sample 1




      Hey! let’s send a marketing campaign to:

                  http://example.com/landing/page



      GET /landing/page HTTP/1.0
      Host: example.com

      HTTP/1.0 302 FOUND
      Location: /landing/page/




                                                    57
WTF / sample 2

     I have 100k rows in my users table...

     I’m going to have 10MM...

     I should split it into 100 buckets,
     with 1MM per bucket so I can scale to 100MM.

     The fundamental problem is that I don’t understand my problem.



     I know what my problems are with 100k users... or do I?

     There is some margin for error...
     you design for 10x...
     as you actualize 10x growth you will (painfully) understand that margin.

     Designing for 100x let alone 1000x
     requires a profound understanding of their problem.

     Very few have that.



                                                                                58
WTF / sample 3

     I plan to have a traffic spike from (link on MSN.com)

     I expect 3000 new visitors per second.

     My page http://example.com/coolstuff is 14k
     2 css files each at 4k
     1 js file at 23k
     17 images each at ~16k
     (everything’s compressed)

     /coolstuff is CPU bound (for the sake of this argument)
     I’ve tuned to 8ms services times...
     8 core machines at 90% means 7200ms of CPU time/second...
     900 req/second per machine...
     3000 v/s / 900 r/s/machine / 70% goal at peak rounded up is...
     5 machines (6 allowing a failure)

     the other files I can serve faster... say 30k requests/second from my
     Varnish instances... 3000 v/s * 20 assets / 30k r/s/varnish / 70% is...
     3 machines (4 allowing a failure).




                                                                               59
WTF / sample 3, the forgotten part

     14k + 2 * 4k + 1 * 23k + 17 * 16k = 21 requests with 317k response

     (317k is 2596864 bits/visit) * 3000 visits/second = 7790592000 b/s

     just under 8 gigabits per second.

     even naively, this is 500 packets per visitor * 3000 visitors/second

     1.5MM packets/second.



     This is no paltry task...

     20 assets/visit are static content, we know how to solve that.

     the rest? ~350 megabits per second and ~75k packets/second

     perfectly manageable, right?

     a bad landing link that 302’s adds ~30k packets/second... Crap.



                                                                            60
Thank You

      Thank you OmniTI

        We’re always looking for a few good engineers!

      Come see me speak at Surge 2010 - http://omniti.com/surge

      Thank you!
                             S32699X_Scalable_Internet.qxd      6/23/06   3:31 PM    Page 1




                                                                                                                                                     Scalable Internet Architectures
                                                                                                                                                                                       Theo Schlossnagle

                                Scalable Internet Architectures
                                  With an estimated one billion users worldwide, the Internet today is nothing less than a
                                  global subculture with immense diversity, incredible size, and wide geographic reach. With a
                                  relatively low barrier to entry, almost anyone can register a domain name today and potentially
                                  provide services to people around the entire world tomorrow. But easy entry to web-based
                                  commerce and services can be a double-edged sword. In such a market, it is typically much
                                  harder to gauge interest in advance, and the negative impact of unexpected customer traffic
                                  can turn out to be devastating for the unprepared.
                                  In Scalable Internet Architectures, renowned software engineer and architect Theo
                                  Schlossnagle outlines the steps and processes organizations can follow to build online
                                  services that can scale well with demand—both quickly and economically. By making
                                  intelligent decisions throughout the evolution of an architecture, scalability can be a matter




                                                                                                                                                                                       Scalable Internet
                                  of engineering rather than redesign, costly purchasing, or black magic.
                                  Filled with numerous examples, anecdotes, and lessons gleaned from the author’s years
                                  of experience building large-scale Internet services, Scalable Internet Architectures is both
                                  thought-provoking and instructional. Readers are challenged to understand first, before they




                                                                                                                                                                                       Architectures
                                  start a large project, how what they are building will be used, so that from the beginning
                                  they can design for scalability those parts which need to scale. With the right approach, it
                                  should take no more effort to design and implement a solution that scales than it takes
                                  to build something that will not—and if this is the case, Schlossnagle writes, respect
                                  yourself and build it right.


                                  Theo Schlossnagle is a principal at OmniTI Computer Consulting, where he provides
                                  expert consulting services related to scalable Internet architectures, database replication,
                                  and email infrastructure. He is the creator of the Backhand Project and the Ecelerity MTA,
                                  and spends most of his time solving the scalability problems that arise in high performance
                                  and highly distributed systems.




                                Internet/Programming                                                     Cover image © Digital Vision/Getty Images


                                               Scalability                                                                                            Schlossnagle
                                              Performance                                        $49.99 USA / $61.99 CAN / £35.99 Net UK
                                                Security
                                              www.omniti.com

                                DEVELOPER’S
                                LIBRARY
                                                                                                                                                      DEVELOPER’S
                                www.developers-library.com                                                                                            LIBRARY




                                                                                                                                                                                                           61

More Related Content

Viewers also liked

Past, Present, and Pachyderm - All Things Open - 2013
Past, Present, and Pachyderm - All Things Open - 2013Past, Present, and Pachyderm - All Things Open - 2013
Past, Present, and Pachyderm - All Things Open - 2013Robert Treat
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everythingTheo Schlossnagle
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaTheo Schlossnagle
 
Postgres 9.4 First Look
Postgres 9.4 First LookPostgres 9.4 First Look
Postgres 9.4 First LookRobert Treat
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentationTheo Schlossnagle
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012Theo Schlossnagle
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet ArchitectureTheo Schlossnagle
 
Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016Robert Treat
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Lost art of troubleshooting
Lost art of troubleshootingLost art of troubleshooting
Lost art of troubleshootingLeon Fayer
 

Viewers also liked (16)

Past, Present, and Pachyderm - All Things Open - 2013
Past, Present, and Pachyderm - All Things Open - 2013Past, Present, and Pachyderm - All Things Open - 2013
Past, Present, and Pachyderm - All Things Open - 2013
 
PostgreSQL on Solaris
PostgreSQL on SolarisPostgreSQL on Solaris
PostgreSQL on Solaris
 
Web Operations Career
Web Operations CareerWeb Operations Career
Web Operations Career
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everything
 
Big Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ PerconaBig Bad PostgreSQL @ Percona
Big Bad PostgreSQL @ Percona
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Postgres 9.4 First Look
Postgres 9.4 First LookPostgres 9.4 First Look
Postgres 9.4 First Look
 
Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Project reality
Project realityProject reality
Project reality
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Scalable Internet Architecture
Scalable Internet ArchitectureScalable Internet Architecture
Scalable Internet Architecture
 
Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Lost art of troubleshooting
Lost art of troubleshootingLost art of troubleshooting
Lost art of troubleshooting
 
Esperwhispering
EsperwhisperingEsperwhispering
Esperwhispering
 

Similar to Velocity 2010: Scalable Internet Architectures

Software Architectures, Week 1 - Monolithic Architectures
Software Architectures, Week 1 - Monolithic ArchitecturesSoftware Architectures, Week 1 - Monolithic Architectures
Software Architectures, Week 1 - Monolithic ArchitecturesAngelos Kapsimanis
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemGiovanni Asproni
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsUwe Friedrichsen
 
ADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for Worse
ADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for WorseADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for Worse
ADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for WorseTechWell
 
Agile Architecture: Ideals, History, and a New Hope
Agile Architecture: Ideals, History, and a New HopeAgile Architecture: Ideals, History, and a New Hope
Agile Architecture: Ideals, History, and a New HopeGary Pedretti
 
Worse Is Better, for Better or for Worse
Worse Is Better, for Better or for WorseWorse Is Better, for Better or for Worse
Worse Is Better, for Better or for WorseKevlin Henney
 
Agile leadership practices for PIONEERS
 Agile leadership practices for PIONEERS Agile leadership practices for PIONEERS
Agile leadership practices for PIONEERSStefan Haas
 
Worse Is Better, for Better or for Worse
Worse Is Better, for Better or for WorseWorse Is Better, for Better or for Worse
Worse Is Better, for Better or for WorseKevlin Henney
 
Agile Architecture and Modeling - Where are we Today
Agile Architecture and Modeling - Where are we TodayAgile Architecture and Modeling - Where are we Today
Agile Architecture and Modeling - Where are we TodayGary Pedretti
 
Architecture In An Agile World
Architecture In An Agile WorldArchitecture In An Agile World
Architecture In An Agile WorldJames Cooper
 
Usability Presentation - IIS Brownbag 2013
Usability Presentation - IIS Brownbag 2013Usability Presentation - IIS Brownbag 2013
Usability Presentation - IIS Brownbag 2013Patrick Hays
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemGiovanni Asproni
 
Sustaining Your Career
Sustaining Your CareerSustaining Your Career
Sustaining Your CareerScott Lowe
 
Complexity versus Lean
Complexity versus LeanComplexity versus Lean
Complexity versus LeanJurgen Appelo
 
Codemash 2.0.1.4: Tech Trends and Pwning Your Pwn Career
Codemash 2.0.1.4: Tech Trends and Pwning Your Pwn CareerCodemash 2.0.1.4: Tech Trends and Pwning Your Pwn Career
Codemash 2.0.1.4: Tech Trends and Pwning Your Pwn CareerKevin Davis
 
MCC Technology Class (April 2012)
MCC Technology Class (April 2012) MCC Technology Class (April 2012)
MCC Technology Class (April 2012) Michael Rawlins
 
MaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdf
MaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdfMaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdf
MaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdfAhmed Mohamed
 
No silver-bullllet-1
No silver-bullllet-1No silver-bullllet-1
No silver-bullllet-1Maria Riaz
 

Similar to Velocity 2010: Scalable Internet Architectures (20)

Software Architectures, Week 1 - Monolithic Architectures
Software Architectures, Week 1 - Monolithic ArchitecturesSoftware Architectures, Week 1 - Monolithic Architectures
Software Architectures, Week 1 - Monolithic Architectures
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestors
 
ADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for Worse
ADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for WorseADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for Worse
ADC-BSC EAST 2013 Keynote: Worse Is Better—For Better or for Worse
 
Agile Architecture: Ideals, History, and a New Hope
Agile Architecture: Ideals, History, and a New HopeAgile Architecture: Ideals, History, and a New Hope
Agile Architecture: Ideals, History, and a New Hope
 
Worse Is Better, for Better or for Worse
Worse Is Better, for Better or for WorseWorse Is Better, for Better or for Worse
Worse Is Better, for Better or for Worse
 
SDLC Smashup
SDLC SmashupSDLC Smashup
SDLC Smashup
 
Agile leadership practices for PIONEERS
 Agile leadership practices for PIONEERS Agile leadership practices for PIONEERS
Agile leadership practices for PIONEERS
 
Worse Is Better, for Better or for Worse
Worse Is Better, for Better or for WorseWorse Is Better, for Better or for Worse
Worse Is Better, for Better or for Worse
 
Agile Architecture and Modeling - Where are we Today
Agile Architecture and Modeling - Where are we TodayAgile Architecture and Modeling - Where are we Today
Agile Architecture and Modeling - Where are we Today
 
Architecture In An Agile World
Architecture In An Agile WorldArchitecture In An Agile World
Architecture In An Agile World
 
Usability Presentation - IIS Brownbag 2013
Usability Presentation - IIS Brownbag 2013Usability Presentation - IIS Brownbag 2013
Usability Presentation - IIS Brownbag 2013
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
 
Sustaining Your Career
Sustaining Your CareerSustaining Your Career
Sustaining Your Career
 
Complexity versus Lean
Complexity versus LeanComplexity versus Lean
Complexity versus Lean
 
Codemash 2.0.1.4: Tech Trends and Pwning Your Pwn Career
Codemash 2.0.1.4: Tech Trends and Pwning Your Pwn CareerCodemash 2.0.1.4: Tech Trends and Pwning Your Pwn Career
Codemash 2.0.1.4: Tech Trends and Pwning Your Pwn Career
 
MCC Technology Class (April 2012)
MCC Technology Class (April 2012) MCC Technology Class (April 2012)
MCC Technology Class (April 2012)
 
MaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdf
MaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdfMaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdf
MaLeNe2021-Evolving_Autonomous_Networks-L_Ciavaglia.pdf
 
Unleash The Monkeys
Unleash The MonkeysUnleash The Monkeys
Unleash The Monkeys
 
No silver-bullllet-1
No silver-bullllet-1No silver-bullllet-1
No silver-bullllet-1
 

More from Theo Schlossnagle

More from Theo Schlossnagle (20)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Atldevops
AtldevopsAtldevops
Atldevops
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Velocity 2010: Scalable Internet Architectures

  • 1. Scalable Internet Architectures / Operating at Scale 1
  • 2. Who am I? @postwait on twitter Author of “Scalable Internet Architectures” Pearson, ISBN: 067232699X CEO of OmniTI We build scalable and secure web applications I am an Engineer A practitioner of academic computing. IEEE member and Senior ACM member. On the Editorial Board of ACM’s Queue magazine. I work on/with a lot of Open Source software: Apache, perl, Linux, Solaris, PostgreSQL, Varnish, Spread, Reconnoiter, etc. I have experience. I’ve had the unique opportunity to watch a great many catastrophes. I enjoy immersing myself in the pathology of architecture failures. 2
  • 3. Topic Progression What is an architecture? What does it mean to run a (scalable) architecture? Scalability Patterns for Dynamic Content Databases Complex Systems Scaling Techniques Bad Ideas 3
  • 4. Architecture / the whole enchilada 4
  • 5. Architecture / what it is architecture (n.): the complex or carefully designed structure of something. specifically in computing: the conceptual structure and logical organization of a computer or a computer-based system. - Oxford American Dictionary 5
  • 6. Architecture / more than meets the eye An architecture is all encompassing. space, power, cooling servers, switches, routers load balancers, firewalls databases, non-database storage dynamic applications the architecture you export to the user (javascript, etc.) 6
  • 7. Architecture / awareness is key Not all people do all things. However... lack of awareness of the other disciplines is bad leads to isolated decisions which leads to unreasonable requirements elsewhere which lead to over engineered products stupid decisions catastrophic failures 7
  • 8. Architecture / running it all Running Operations is serious stuff It takes knowledge, tools... but that is not enough. It takes experience. And perhaps even more importantly... It takes discipline. 8
  • 9. Architecture / knowledge Read. Study. Leverage User Groups (SAGE,LUGs,OSUGs,PUGs,etc.) Participate in the community. Go to conferences: Velocity (now, you’re here - good job!) Structure (now, you’re not here - sorry) Surge (Baltimore, Sept 30 - Oct 1) 9
  • 10. Architecture / tools Collaborate with colleagues. Try new tools. Write new tools. Know and practice your tools during the “good times” in order to make their use effortless during the “bad times” 10
  • 11. Architecture / tool theories “One only needs two tools in life: WD-40 to make things go, and duct tape to make them stop.” - George Weilacher “Men have become the tools of their tools.” - Henry David Thoreau “All the tools and engines on earth are only extensions of man's limbs and senses.” - Ralph Waldo Emerson 11
  • 12. Architecture / my take on tools Tools are just tools. They are absolutely essential to doing your job. They will never do your job for you. Tools will never replace experience and discipline. But tools can help you maintain discipline. 12
  • 13. Architecture / experience “Good judgment comes from experience. Experience comes from bad judgment.” - Proverb “Judge people on the poise and integrity with which they remediate their failures.” - me 13
  • 14. Architecture / discipline Discipline is important in any job. Discipline is “controlled behavior resulting from training, study and practice.” In my experience discipline is the most frequently missing ingredient in the field of web operations. I believe this to be caused by a lack of focus, laziness, and the view that it is a job instead of an art. As in any trade To be truly excellent one must treat it as a craft. One must become a craftsman. Through experience learn discipline. And through practice achieve excellence. 14
  • 15. Architecture / actually running it all Okay, I get it. From day to day, what do I need to know? 15
  • 16. Architecture / version control Switch configurations should be in version control. Router configurations should be in version control. Firewall configurations should be in version control. System configurations should be in version control. Application configurations should be in version control. Monitoring configurations should be in version control. Documentation should be in version control. Application code should be in version control. Database schema should be in version control. Everything you do should be in version control. 16
  • 17. Architecture / version control And no... it doesn’t matter which tool. It’s not about the tool, it’s about the discipline to always use it. (today, we use subversion) 17
  • 18. Architecture / know your systems To know when something looks unhealthy, one must know what healthy looks like. Monitor everything. Collect as much system and process information as possible. Look at your systems and use your diagnostic tools when things are healthy. 18
  • 19. Architecture / management Package roll out? Machine management? Provisioning? They tell me I should use Puppet. They tell me I should use Chef. well... I stick to my theory on tools: A master craftsman chooses or builds the tools he likes. A tool does not the master craftsman make. 19
  • 20. Dynamic Content / keeping users interested 20
  • 21. Techniques / Dynamic Content “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” - Donald Knuth “Knowing when optimization is premature defines the difference between the master engineer and the apprentice.” - me 21
  • 22. Techniques / optimization Optimization comes down to a simple concept: “don’t do work you don’t have to.” It can take the form of: computational reuse caching in a more general sense and my personal favorite: ... avoid the problem, and do no work at all. 22
  • 23. Techniques / optimization applied Optimization in dynamic content simply means: Don’t pay to generate the same content twice Only generate content when things change Break the system into components so that you can isolate the costs of things that change rapidly from those that change infrequently. There is a simple truth: your content isn’t as dynamic as you think it is 23
  • 24. Techniques / optimization applied Javascript, CSS and images are only referentially linked They should all be consolidated and optimized. They should be publicly cacheable and expire 10 years from now. RewriteRule (.*).([0-9]+).css $1.css Means that /s/app.23412.css is just /s/app.css different URL means new cached copy any time the CSS is changed, just bump the number the application references from HTML. Same applies for Javascript. Images... you should just deploy a new one at a new URI. 24
  • 25. Techniques / per user info If you could have a distributed database that: when a node fails, you can guarantee no one needs the info on it it is always located near the user accessing it it can easily grow as your user base grows Introducing CookieDB: it’s been here all along it’s up in your browser use it 25
  • 26. Techniques / data caching Asking hard questions of database can be “expensive” You have two options: cache the results best when you can’t afford to be accurate materialize a view on the changes best when you need to be accurate 26
  • 27. Techniques / choosing technologies Understand how you will be writing data into the system. Understand how you will be retrieving data from the system. WAIT... don’t stop. Understand how everyone else in your organization will be retrieving data from the system. Research technologies and attempt find a good fit for your requirements: data access patterns, consistency, availability, recoverability, performance, stability This is not as easy as it sounds. It requires true technology agnosticism. 27
  • 28. Data Management / remembering something useful 28
  • 29. Techniques / Databases Rule 1: shard your database Rule 2: shoot yourself 29
  • 30. Databases / second try Horizontally scaling your databases via sharding/federating requires that you make concessions that should make you cry. shard (n.) a piece of broken ceramic, metal, glass, or rock typically having sharp edges. sharding (v.) dunno... but you will likely wound yourself and you get to keep all the pieces. But seriously... databases (other than MySQL) scale vertically to a greater degree than many people admit. if you must fragment your data, you will throw away relational constraints. this should make you cry. cry. cry hard. cry some more. then move on and shard your database. 30
  • 31. Databases / vertical scaling Many times relational constraints are not needed on data. If this is the case, a traditional relational database is unnecessary. There are cool technologies out there to do this: “files” “noSQL” cookies Non-ACID databases can be easier to scale Vertical scaling is achieved via two mechanisms: doing only what is absolutely necessary in the database running a good database that can scale well vertically 31
  • 32. Databases / horizontal scaling Okay... so you really need to scale horizontally. understand the questions you intend to ask. make sure that you partition in a fashion that doesn’t require more than a single shard to answer OLTP-style questions. If that is not possible, consider data duplication. 32
  • 33. Databases / an example private messages all stored on the server side individuals sends messages to their friends an individual should see all messages sent to them Easy! partition by recipient. either by hash range partitions whatever 33
  • 34. Databases / an example complicated now users must be able to review all sent messages. Crap! our recipient-based partitioning causes us to map the request across all shards to answer messages by sender. In this case: store messages twice... once by recipient and once by sender twice the storage, but queries only hit a single node now 34
  • 35. Databases / an example unwound Partitioning data allows one to reduce the dataset size on each node. You might just cause more problems than you’ve solved. Complicated (or even simple) queries become a pain if they don’t align with your partitioning strategy. Partitioning like this is really a commitment. You lose much of the power of your relational database and complicate what were once easy problems. Sometimes you have to do what you have to do. Don’t make the concession until you have to. 35
  • 36. Databases / take care Multi-master replication (in ACID databases) is simply not ready these days. getting closer every year. When partitioning/federating/sharding data, take the step to model what you are doing. Prototype several different schemes and make sure you truly understand your intended use patterns before deciding. 36
  • 37. Databases / Stepping outside of ACID There are some alternatives to traditional RDBMS systems. Key-Value stores and document databases offer interesting alternatives. Without an imposed relational model federating/sharding is much easier to bake in. By relaxing consistency requirements, one can increase availability by adopting a paradigm of eventual consistency. MongoDB Cassandra Voldemort Redis CouchDB 37
  • 38. Databases / noSQL noSQL systems aren't a cure-all data storage paradigm. A lot of data has relationships that are important. Referential integrity is quite important in many situations. A lot of datasets do not need to scale past a single instance. "Vertical scaling is not a strategy" is a faulty argument. Not every component of the architecture needs to scale past the limits of vertical scaling. If you can segregate your components, you can adhere to a right tool for the job paradigm. Use SQL where it is the best tool for the job and use distributed key-value stores and document databases in stations where they shine. 38
  • 39. Databases / when break the problems down into small pieces and decouple them determine how large the problem is and can grow fit the solution to the problem avoid: “shiny is good” avoid: “over engineering” embrace: “K.I.S.S.” embrace: “good is good” 39
  • 40. Databases / reality or “unpopular opinion” noSQL is the solution to today’s Web 2.0 problems: not really traditional RDBMS patterns will take you to finish line: nope I can just replace my DBMS with a key-value store: not exactly you must map your RPO and RTO and ACID requirements good luck (again: break down the problems) 40
  • 41. Service Decoupling / controlling experience by removing ‘the suck’ 41
  • 42. Techniques / Service Decoupling One of the most fundamental techniques for building scalable systems Asynchrony... Why do now what you can postpone until later? This mantra often doesn’t break a user’s experience. Break down the user transaction into parts. Isolate those that could occur asynchronously. Queue the information needed to complete the task. Process the queues “behind the scenes.” 42
  • 43. Techniques / Service Decoupling Asynchrony... that’s not really what it means. It isn’t exactly about postponing work (though that can happen). It is about service isolation. By breaking the system in to small parts we gain: problem simplification, fault isolation, decoupling of approach, strategy and tactics, simplified design, models for performance that are more likely to be accurate, and simplified overall capacity planning. 43
  • 44. Decoupling / concept If I don’t want to do something now... I must tell someone to do it later. This is “messaging” There are a lot of solutions: JMS (Java message service) Spread (extended virtual synchrony messaging bus) AMQP (advanced message queueing protocol) 44
  • 45. Decoupling / tools Message Queueing is a critical tool in the stack... durable message queueing: ActiveMQ (Java) OpenAMQ (C) RabbitMQ (erlang) Most common protocol is STOMP STOMP kinda sucks... but it is universal Clients exist for every language 45
  • 46. Decoupling / control “Moderation in all things, including moderation.” - Titus Petronius AD 27-66 46
  • 47. Design & Implementation Techniques / some say architecture != implementation 47
  • 48. Architecture vs. Implementation Architecture is without specification of the vendor, make model of components. Implementation is the adaptation of an architecture to embrace available technologies. They are intrinsically tied. Insisting on separation is a metaphysical argument (with no winners) 48
  • 49. Respect Engineering Math Engineering math: 19 + 89 = 110 “Precise Math”: 19 + 89 = 10.8 Ok. Ok. I must have, I must have put a decimal point in the wrong place or something. Shit. I always do that. I always mess up some mundane detail. - Michael Bolton in Office Space 49
  • 50. Insure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is there current resource utilization? Bob: About 200 req/second, 8 servers and they have no headroom. Alice: How many req/second do you need? Bob: 800 req/second would be good. Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on your goal. Bob: No... 200 / 8 = 25 req/second per server. Alice: Bob... the gods are angry. 50
  • 51. Why you’ve pissed of the gods. Most web apps are CPU bound (as I/O happens on a different layer) Typical box today: 8 cores are 2.8GHz or 22.4 BILLION instructions per second. 22x109 instr/s / 25 req/s = 880 MILLION instructions per request. This same effort (per-request) provided me with approximately 15 minutes enjoying “Might & Magic 2” on my Apple IIe - you’ve certainly pissed me off. No wonder the gods are angry. 51
  • 52. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Problems: very hard to develop a complete and accurate model for solving Benefits: provides insight on architecture capacitance dependencies relatively easy to understand illustrates opportunities to further isolate work 52
  • 53. Rationalize your model Draw your model out Take measurements and walk through the model to rationalize it i.e. prove it to be empirically correct You should be able to map actions to consequences: A user signs up ➙ 4 synchronous DB inserts (1 synch IOPS + 4 asynch writes) 1 AMQP durable, persistent message 1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes In a dev environment, simulate traffic and rationalize you model I call this a “data flow causality map” 53
  • 54. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems give: simplified modeling and capacity planning slight inefficiencies promotes lower contention requires design of systems with less coherency requirements each isolated service is simpler and safer SCALES. 54
  • 55. WTF / most scalability problems are due to idiocy 55
  • 56. WTF / don’t be an idiot most acute scalability disasters are due to idiots don’t be an idiot scaling is hard performance is easier extremely high-performance systems tend to be easier to scale because they don’t have to SCALE as much. 56
  • 57. WTF / sample 1 Hey! let’s send a marketing campaign to: http://example.com/landing/page GET /landing/page HTTP/1.0 Host: example.com HTTP/1.0 302 FOUND Location: /landing/page/ 57
  • 58. WTF / sample 2 I have 100k rows in my users table... I’m going to have 10MM... I should split it into 100 buckets, with 1MM per bucket so I can scale to 100MM. The fundamental problem is that I don’t understand my problem. I know what my problems are with 100k users... or do I? There is some margin for error... you design for 10x... as you actualize 10x growth you will (painfully) understand that margin. Designing for 100x let alone 1000x requires a profound understanding of their problem. Very few have that. 58
  • 59. WTF / sample 3 I plan to have a traffic spike from (link on MSN.com) I expect 3000 new visitors per second. My page http://example.com/coolstuff is 14k 2 css files each at 4k 1 js file at 23k 17 images each at ~16k (everything’s compressed) /coolstuff is CPU bound (for the sake of this argument) I’ve tuned to 8ms services times... 8 core machines at 90% means 7200ms of CPU time/second... 900 req/second per machine... 3000 v/s / 900 r/s/machine / 70% goal at peak rounded up is... 5 machines (6 allowing a failure) the other files I can serve faster... say 30k requests/second from my Varnish instances... 3000 v/s * 20 assets / 30k r/s/varnish / 70% is... 3 machines (4 allowing a failure). 59
  • 60. WTF / sample 3, the forgotten part 14k + 2 * 4k + 1 * 23k + 17 * 16k = 21 requests with 317k response (317k is 2596864 bits/visit) * 3000 visits/second = 7790592000 b/s just under 8 gigabits per second. even naively, this is 500 packets per visitor * 3000 visitors/second 1.5MM packets/second. This is no paltry task... 20 assets/visit are static content, we know how to solve that. the rest? ~350 megabits per second and ~75k packets/second perfectly manageable, right? a bad landing link that 302’s adds ~30k packets/second... Crap. 60
  • 61. Thank You Thank you OmniTI We’re always looking for a few good engineers! Come see me speak at Surge 2010 - http://omniti.com/surge Thank you! S32699X_Scalable_Internet.qxd 6/23/06 3:31 PM Page 1 Scalable Internet Architectures Theo Schlossnagle Scalable Internet Architectures With an estimated one billion users worldwide, the Internet today is nothing less than a global subculture with immense diversity, incredible size, and wide geographic reach. With a relatively low barrier to entry, almost anyone can register a domain name today and potentially provide services to people around the entire world tomorrow. But easy entry to web-based commerce and services can be a double-edged sword. In such a market, it is typically much harder to gauge interest in advance, and the negative impact of unexpected customer traffic can turn out to be devastating for the unprepared. In Scalable Internet Architectures, renowned software engineer and architect Theo Schlossnagle outlines the steps and processes organizations can follow to build online services that can scale well with demand—both quickly and economically. By making intelligent decisions throughout the evolution of an architecture, scalability can be a matter Scalable Internet of engineering rather than redesign, costly purchasing, or black magic. Filled with numerous examples, anecdotes, and lessons gleaned from the author’s years of experience building large-scale Internet services, Scalable Internet Architectures is both thought-provoking and instructional. Readers are challenged to understand first, before they Architectures start a large project, how what they are building will be used, so that from the beginning they can design for scalability those parts which need to scale. With the right approach, it should take no more effort to design and implement a solution that scales than it takes to build something that will not—and if this is the case, Schlossnagle writes, respect yourself and build it right. Theo Schlossnagle is a principal at OmniTI Computer Consulting, where he provides expert consulting services related to scalable Internet architectures, database replication, and email infrastructure. He is the creator of the Backhand Project and the Ecelerity MTA, and spends most of his time solving the scalability problems that arise in high performance and highly distributed systems. Internet/Programming Cover image © Digital Vision/Getty Images Scalability Schlossnagle Performance $49.99 USA / $61.99 CAN / £35.99 Net UK Security www.omniti.com DEVELOPER’S LIBRARY DEVELOPER’S www.developers-library.com LIBRARY 61