My “Day 1” + 4 Years Reference
Infrastructure ~70 Servers Tens of Thousands bit.ly/15wBfMp
DAU ~100,000 72,000,000 www.consulgamer.com/tag/zynga/
Servers/Game 1 - 20 50 to > 1,000 bit.ly/ahVaYI
Employees ~40 2,916 investor.zynga.com/faq.cfm
Biz Analytics 0 24.5T rows @ 1.4 PB bit.ly/L58opy
Splunk> N/A 10TB+ / 50B events bit.ly/17f4kj2
Traditional Infrastructure
Variety of Server
Types in Retail DC
Order/Rack/Stack
Puppet Config
Management
Public Cloud
Amazon EC2 +
RightScale
Scaling
Everything
SRE/NOC
Private: “zCloud”
CloudStack +
RightScale =
AutoScale
CMDB
3 Server SKUs
Centralized Services
Indexing Cluster
Idx Idx Idx Idx
Idx Idx Idx Idx
Idx Idx Idx Idx
Indexing Cluster
Idx Idx Idx Idx
Idx Idx Idx Idx
Idx Idx Idx Idx
Indexing Cluster
Idx Idx Idx Idx
Idx Idx Idx Idx
Idx Idx Idx Idx
FV CV ZP MW Ops $ HC EA WWF
Cust
SVC
SWF DS DZ CWF
SWF
HWF
Customers>
S
H
S
H
S
H
S
H
S
H
S
H
S
H
S
H
S
H
S
H
S
H
S
H
Search Heads>
Cloud
Services
Cloud Workshop
•Identify Workloads
•Design/Architect
Compute, Storage, Network
•Tailor Service/Support Needs
•SOW
Installation@Redapt
•Network
•Storage
•Compute
•Hypervisors
•Orchestration Validation
•Customer Remote Tour/Test
•Shipment to Customer Prem
•Onsite Training
Application Migration
•Rearchitect Legacy Applications
into “aaS” architectures
•Migrate applications from
Public to Private or Hybrid
Clouds
Project Management
Integration
Services
• Procurement
• Integrated
Racks ready to
go.
Contact Information
Redapt @ splunk .conf 2013   splunk in the hyperscale private cloud

Redapt @ splunk .conf 2013 splunk in the hyperscale private cloud

  • 4.
    My “Day 1”+ 4 Years Reference Infrastructure ~70 Servers Tens of Thousands bit.ly/15wBfMp DAU ~100,000 72,000,000 www.consulgamer.com/tag/zynga/ Servers/Game 1 - 20 50 to > 1,000 bit.ly/ahVaYI Employees ~40 2,916 investor.zynga.com/faq.cfm Biz Analytics 0 24.5T rows @ 1.4 PB bit.ly/L58opy Splunk> N/A 10TB+ / 50B events bit.ly/17f4kj2
  • 5.
    Traditional Infrastructure Variety ofServer Types in Retail DC Order/Rack/Stack Puppet Config Management Public Cloud Amazon EC2 + RightScale Scaling Everything SRE/NOC Private: “zCloud” CloudStack + RightScale = AutoScale CMDB 3 Server SKUs Centralized Services
  • 7.
    Indexing Cluster Idx IdxIdx Idx Idx Idx Idx Idx Idx Idx Idx Idx Indexing Cluster Idx Idx Idx Idx Idx Idx Idx Idx Idx Idx Idx Idx Indexing Cluster Idx Idx Idx Idx Idx Idx Idx Idx Idx Idx Idx Idx FV CV ZP MW Ops $ HC EA WWF Cust SVC SWF DS DZ CWF SWF HWF Customers> S H S H S H S H S H S H S H S H S H S H S H S H Search Heads>
  • 19.
    Cloud Services Cloud Workshop •Identify Workloads •Design/Architect Compute,Storage, Network •Tailor Service/Support Needs •SOW Installation@Redapt •Network •Storage •Compute •Hypervisors •Orchestration Validation •Customer Remote Tour/Test •Shipment to Customer Prem •Onsite Training Application Migration •Rearchitect Legacy Applications into “aaS” architectures •Migrate applications from Public to Private or Hybrid Clouds Project Management Integration Services • Procurement • Integrated Racks ready to go.
  • 20.

Editor's Notes

  • #6 Traditional+ CustomProcurement+ VLANs+ Apps evolve – eliminating monolithic DBs+ Tools Evolve (Puppet)+ Rapid Growth from the top 2 tenants – FULL!Public Cloud+ Infrastructure == Code+ Estimated FarmVille @ 100k-200k+ Unprecedented Scale with 1M in Week 1…
  • #7 P1…1000’s of machines – Puppet absolutely necessaryCores don’t really accelerate. IO Bound.Local Storage >> EBS/SAN
  • #8 Not Day 1 Architecture, this evolved rapidlyIndexing Clusters driven by learns that larger clusters can diminish performance when there is one indexing node performing abnormally slowlyAs infrastructure pivoted into zCloud, forwarders instrumented in VPCDedicated isolated infrastructure to drive compliance and governance around Payments within a PCI cluster
  • #9 Zynga as very metrics orientedStart quickly – Don’t boil the ocean.Stay out of the way of the business.Endeavor to do more than that.P1…1000’s of machines – Puppet absolutely necessaryDoesn’t matter which one you use, just use oneCorrelations – CS Ticket volume, Error rates, Release EventsNagios tells you something is wrong, but not where. Splunk instead of Nagios “Artificial Intelligence”
  • #11 How to relate this to your environments…These kinds of impacts are available at any scaleAverage out 72M players who play 20 minutes/day, you have 1,000,000concurrentsReleases generally cannot be reverted (buy a purple cow, can’t take it back)