200M           live listings
                                                              22B       page views/day




9 Pb   of data
                                                         6,000         application servers


                   250M         queries/day

                                                        94M active users
                                                                           23M           SLOC
                 $62B   2010 gross merchandise volume     75B database calls/day
Beta

  PCI
Compliant                                    PCI
                                           Compliant
                              Production

            Research
                                           Skunkworks



                                QA
DR




Number of servers required based on utilization for 8 pools
Even at 4x the
       internal cost, public                               Cloud cost to
        cloud would save                                   Internal cost
             money                                              ratio




                               Internal cost is dominant




External cost is dominant
?


Private                               Public




                    Hybrid




          Build                 Buy




                  Build + OSS
Service Catalog                REST APIs


  Ticket driven run book      Model driven close loop
       automation                  automation

Configuration Management
                           Distributed state Management
   Database (CMDB)


       Chargeback                 Pay as you go


                           Multitenant infrastructure with
   Server Virtualization
                                  secure isolation
Cannot be
                     The task requires human involvement (e.g. racking and wiring)
  automated




No support for
                  Component lacks API or requires UI based actions (e.g. checkpoint)
 automation




Limited rate of
                  Configuration requires restart, reload, file sync (e.g. Bind, ISC DHCP)
   change




No permission     Configuration requires special credential/role (e.g. firewall, network)
Application   App     App     App
                                      Application         App       App     App




  Spare       spare   spare   spare
                                                    Global resource pool
   Infra      Infra   Infra   Infra
                                                    Shared infrastructure
request      order            receive &                      deliver
{nb servers,                   rack & wire
model, app }                   Label (app)



                                                                                        “several”
                                                  1w                                     weeks
                       2-3 w
                                                repurpose




  request    order              Receive           deliver to     request      deliver
{nb servers,                   pre-racked          cache       {nb servers,
  model }                      Pre-wired                       model, app }
 quarterly



                                                                                         45 min
                                        1 day
                       2-3 w
                                                                  repurpose
IaaS/PaaS API                                              IaaS/PaaS API
                      Resource             Distributed                          Resource       Distributed
  orchestration                                              orchestration
                      Allocation              State                             Allocation        State



                     Application          Access Point                          Application   Access Point
  AuthN/AuthZ                                                AuthN/AuthZ
                     Controller            Controller                           Controller     Controller



   Compute             Cluster               Pool             Compute            Cluster         Pool
   Controller         Controller           Controller         Controller        Controller     Controller




Compute Mgt.       DNS Mgt.
                                   LB
                                   Mgt.
                                                Monitoring                Open Source
                                                                             Solution
Network Prov        Image/Pkg Repo          Software Dist.
                                                                     (openstack / Cloudstack)
eBay From Ground Level to the Clouds

eBay From Ground Level to the Clouds

  • 3.
    200M live listings 22B page views/day 9 Pb of data 6,000 application servers 250M queries/day 94M active users 23M SLOC $62B 2010 gross merchandise volume 75B database calls/day
  • 4.
    Beta PCI Compliant PCI Compliant Production Research Skunkworks QA
  • 7.
    DR Number of serversrequired based on utilization for 8 pools
  • 8.
    Even at 4xthe internal cost, public Cloud cost to cloud would save Internal cost money ratio Internal cost is dominant External cost is dominant
  • 9.
    ? Private Public Hybrid Build Buy Build + OSS
  • 11.
    Service Catalog REST APIs Ticket driven run book Model driven close loop automation automation Configuration Management Distributed state Management Database (CMDB) Chargeback Pay as you go Multitenant infrastructure with Server Virtualization secure isolation
  • 12.
    Cannot be The task requires human involvement (e.g. racking and wiring) automated No support for Component lacks API or requires UI based actions (e.g. checkpoint) automation Limited rate of Configuration requires restart, reload, file sync (e.g. Bind, ISC DHCP) change No permission Configuration requires special credential/role (e.g. firewall, network)
  • 14.
    Application App App App Application App App App Spare spare spare spare Global resource pool Infra Infra Infra Infra Shared infrastructure
  • 15.
    request order receive & deliver {nb servers, rack & wire model, app } Label (app) “several” 1w weeks 2-3 w repurpose request order Receive deliver to request deliver {nb servers, pre-racked cache {nb servers, model } Pre-wired model, app } quarterly 45 min 1 day 2-3 w repurpose
  • 16.
    IaaS/PaaS API IaaS/PaaS API Resource Distributed Resource Distributed orchestration orchestration Allocation State Allocation State Application Access Point Application Access Point AuthN/AuthZ AuthN/AuthZ Controller Controller Controller Controller Compute Cluster Pool Compute Cluster Pool Controller Controller Controller Controller Controller Controller Compute Mgt. DNS Mgt. LB Mgt. Monitoring Open Source Solution Network Prov Image/Pkg Repo Software Dist. (openstack / Cloudstack)

Editor's Notes

  • #11 Virtual Data CenterExternal cloud looks like an extension of eBay DCAll traffic goes through VPN (or private peering)Internal IP space is shared between two domainseBay’s DNS zones are delegated to cloud providersDDOS/IDS is on eBay’s side Most transparent model, but creates a lot of technical issues (good when application complexity requires it)Public Shared CloudExternal cloud looks like a 3rd partyAll traffic goes through Internet (or private peering)eBay’s Internal IP space is not accessible Two DNS/IP management pointsDDOS/IDS on Public cloud side ?  Most “cloud like” model, but has more limitation (good for isolated use cases)
  • #14 Before:Mesh of application dependencies (build time, and run time)One build per deployment environmentBig monolithic deliverablesHighly latency sensitive because of DB dependenciesOngoing:Decomposition of applications into servicesModularization of code base (OSGI) No more train releasesBuild Once, Deploy EverywhereRefactorization of DB dependencies behind servicesFormal declaration of dependencies‘Cloud friendly’Future:Migration of some data into “cloud friendly” DB (MongoDB, Cassandra, Hbase, …)Redesign of platform services (e.g. logging) to be less infrastructure dependent‘Cloud ready’
  • #15 Implications:IP space does not identify members of an applicationCannot use application name as a label on cables Change in asset management (e.g. fulfillment, chargeback)Less flexibility in h/w choice or customization (however, changing from small to large VM is faster)Stricter isolation requirement to support multi-tenancy (virtual environments)
  • #17 Today:Missing features (Monitoring, DNS Mgt., LB Mgt, software deployment, PaaS level features…)Not managing full lifecycle (focused on customer facing functions)Impedance mismatch (ISP profile vs. eBay)Infrastructure dependencies have scalability implications (mostly around network isolation) Gaps but catching up fast : opportunity to contributeFuture:Adopt as much as possible and contributeIncrease open source footprint as maturity/feature set improvesKeep abstraction layer to provide eBay’s specific flavor and PDLC integration Adopt gradually but keep eBay’s abstraction