Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud

•

1 like•984 views

The document discusses Tapjoy's use of OpenStack and AWS. Tapjoy is a global app-tech startup that powers monetization, analytics, user acquisition and retention for mobile developers. They were an early AWS adopter but grew to over 1100 AWS VMs daily, so decided to build their own OpenStack deployment (Tapjoy-1) for additional compute capacity and flexibility. Key points included partnerships with Metacloud and Equinix to deploy and manage Tapjoy-1, challenges around hardware delays and negotiations, and plans to use both AWS and Tapjoy-1 flexibly based on application needs.

Software

Tapjoy & OpenStack
Delivering Billions of
Requests Daily
Wes Jossey
Head of Operations @Tapjoy

Tapjoy
● Global App-Tech Startup
● We Power For Mobile Developers:
○ Monetization
○ Analytics
○ User Acquisition
○ User Retention
● 450M+ Monthly Users Across 270k+ Apps
● Worldwide Presence

Technical Details
● Early AWS Adopter.
● Grew Predominantly on AWS.
● Over 1,100 AWS VMs Daily (10/2014)
● Active Regions in Asia, Europe, N.A.
● Over One Trillion Requests Handled
Annually

Tech Philosophy
● Compute (EC2 & Nova) Driven Company
○ Operate Your Own Infrastructure
■ But Not Necessarily Built-From-Scratch
○ Zero Heart-Attack Nodes
■ All Nodes Are Ephemeral
■ Data is Always Distributed
■ Failure is Always Tolerated
■ Misbehaving Instances Are Terminated Quickly

Services We Use
● SQS
○ Simple, Inexpensive, Durable.
○ Currently Building New Internal System Influenced
by SQS, but with Different Guarantees
○ No Lock-In (See https://github.com/Tapjoy/chore)
● RDS
○ No Lock in. Simple. Easy.
● Cloudwatch (but also statsd)

Services We Use Cont.
● ELB
○ SSL Termination Only. Routing Handled Elsewhere.
● Auto-Scaling
○ Traffic can fluctuate 30% peak to valley
● S3
○ Where we store ALL the things
○ Still price competitive for what it provides. No plans
to leave as of today.

Use Compute Everywhere
● Every Dev Has Access to Either AWS or
Tapjoy-1 (Tapjoy’s OpenStack Deployment)
● Simulate Changes Against Useful Data
● Test Algorithms on Large Hadoop Clusters
● Practice for Failure With Access to Real
Services (not mock endpoints)

Going Hybrid
● We Spend in the Millions on AWS
● Picked Data-Science Infrastructure because
of Portability, and Ability to Leverage More
Nodes
● Lower Risk than Tier-1 Production Services
● Wanted a Partner to Maintain OpenStack
like Amazon ‘Maintains’ AWS
● We Want to Operate Apps

Vendors (It Matters)
● Metacloud
○ Verified our Design
○ Deployed Openstack
○ Provisioned Network
○ Allowed Us to Focus on Business Applications
● Equinix
○ Cooling & Power Design
○ Remote Hands
○ Went Above and Beyond on Numerous Occasions

Vendors: Full List
● Metacloud
● Equinix
● Quanta
● Cumulus
● Level3
● Newegg

Challenges
● Hardware Delays Killed Our Timelines
○ Blew through our contingency windows.
○ Hurt our budgets.
○ Delayed subsequent purchases
● Setting Up IP Transit Can Be Slow
● No Physical Presence in DC
○ Also a Pro
● No Internal Previous Success Story… So
Lots of Skepticism

The Not So Glamorous Job
● Negotiations Can Be Exhausting
● If You’re An Engineer, the Turn Around Time
Can Be Frustrating
● You Probably Need a Gantt Chart
● There’s Nothing Agile About Writing a Big
Check

Tapjoy-1: Data Nodes
348 ‘Data’ All Purpose Nodes
● Quanta S910-X31E: 12 Node Configuration
● Per Node
○ Intel 1265Lv3 @ 2.5GHz
○ 4x1TB 7200RPM
○ 32GB RAM
○ Dual 1Gig NIC
● ‘Recyclable’ for Other Tasks if we Evolve

Tapjoy-1: Management Nodes
12 ‘Management’ Nodes
● Quanta S180: 4 Node Configuration
● Per Node
○ Intel 2650v2 x2 @2.60GHz
○ 128GB RAM
○ 6x480GB SSD
○ Dual 10Gig NIC

Plan For Failure
● Hardware
○ I’m Not Saying You Shouldn’t Use CEPH…
■ But You’ll Notice it’s Absent Here
● Service Boundaries
○ Have Hardware & Software Contingencies
■ Backup Links
■ Temporary Cache(s)
○ Actually Test Failure in Production

Info
● Twitter! @dustywes
● Email: wes@tapjoy.com

What's hot

Kapacitor Stream ProcessingInfluxData

OSOM - Operations in the CloudMarcela Oniga

OSOM Operations in the Cloudmstuparu

Big Data and OpenStack, a Love Story: Michael Still, RackspaceOpenStack

Thinking DevOps in the Era of the Cloud - Demi Ben-AriDemi Ben-Ari

Finding Cars and Hunting Down Logs - ElasticSearch @AutoScout24Philipp Garbe

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Coburn Watson

OpenStack, a view from sysadmin. Ver. 0.1Hazzim Anaya

[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT

NodeTime Tool Reviewgs289509

AnsibleFest 2019 - Greenfielding Network and Systems Automation in a Large an...Logan Best

Hyperloglog Lightning TalkSimon Prickett

Monitoring in a scalable worldTechExeter

WHODIS_kearns_presentation.v0aEdward Kearns

Active record, standalone migrations, and working with ArelAlex Tironati

Lightning talk: building a cloud of faresRalph Ligtenberg

Concurrency in SwiftSeven Peaks Speaks

Deployment StrategiesPiotr Perzyna

Experiences sharing about Lambda, Kinesis, and PostgresqlOkis Chuang

Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019Icinga

What's hot (20)

Kapacitor Stream Processing

OSOM - Operations in the Cloud

OSOM Operations in the Cloud

Big Data and OpenStack, a Love Story: Michael Still, Rackspace

Thinking DevOps in the Era of the Cloud - Demi Ben-Ari

Finding Cars and Hunting Down Logs - ElasticSearch @AutoScout24

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...

OpenStack, a view from sysadmin. Ver. 0.1

[Meetup] a successful migration from elastic search to clickhouse

NodeTime Tool Review

AnsibleFest 2019 - Greenfielding Network and Systems Automation in a Large an...

Hyperloglog Lightning Talk

Monitoring in a scalable world

WHODIS_kearns_presentation.v0a

Active record, standalone migrations, and working with Arel

Lightning talk: building a cloud of fares

Concurrency in Swift

Deployment Strategies

Experiences sharing about Lambda, Kinesis, and Postgresql

Moving from Icinga 1 to Icinga 2 + Director - Icinga Camp Zurich 2019

Similar to Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud

USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse

Yipit - AWS Start-Up Customer Amazon Web Services

Montreal OpenStack Q2 MeetUp - May 30th 2017Stacy Véronneau

Netty trainingMarcelo Serpa

Netty trainingJackson dos Santos Olveira

OpenStack Toronto Q2 MeetUp - June 1st 2017Stacy Véronneau

kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community

Our journey with druid - from initial research to full production scaleItai Yaffe

TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter

Kernel Recipes 2014 - Performance Does MatterAnne Nicolas

Aws uk ug #8 not everything that happens in vegas stay in vegasPeter Mounce

NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg

AWS Techniques and lessons writing low cost autoscaling GitLab runnersAnthony Scata

Devoxx : being productive with JHipsterJulien Dubois

Data Science in the Cloud @StitchFixC4Media

OpenStack Ottawa Q2 MeetUp - May 31st 2017Stacy Véronneau

Triangle Devops Meetup 10/2015aspyker

Cloud arch patternsCorey Huinker

Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...InfluxData

Fineo Technical Overview - NextSQL for IoTJesse Yates

Similar to Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud (20)

USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month

Yipit - AWS Start-Up Customer

Montreal OpenStack Q2 MeetUp - May 30th 2017

Netty training

OpenStack Toronto Q2 MeetUp - June 1st 2017

kranonit S06E01 Игорь Цинько: High load

Our journey with druid - from initial research to full production scale

TRHUG 2015 - Veloxity Big Data Migration Use Case

Kernel Recipes 2014 - Performance Does Matter

Aws uk ug #8 not everything that happens in vegas stay in vegas

NetflixOSS Meetup season 3 episode 1

AWS Techniques and lessons writing low cost autoscaling GitLab runners

Devoxx : being productive with JHipster

Data Science in the Cloud @StitchFix

OpenStack Ottawa Q2 MeetUp - May 31st 2017

Triangle Devops Meetup 10/2015

Cloud arch patterns

Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...

Fineo Technical Overview - NextSQL for IoT

Recently uploaded

2.pdf Ejercicios de programación competitivaDiego Iván Oliveros Acosta

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Introduction Computer Science - Software Design.pdfFerryKemperman

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater

How to submit a standout Adobe Champion ApplicationBradBedford3

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services

Cyber security and its impact on E commercemanigoyal112

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini

Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Recently uploaded (20)

2.pdf Ejercicios de programación competitiva

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Introduction Computer Science - Software Design.pdf

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

Ahmed Motair CV April 2024 (Senior SW Developer)

How to submit a standout Adobe Champion Application

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

Automate your Kamailio Test Calls - Kamailio World 2024

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Unveiling Design Patterns: A Visual Guide with UML Diagrams

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...

Cyber security and its impact on E commerce

Recruitment Management Software Benefits (Infographic)

Xen Safety Embedded OSS Summit April 2024 v4.pdf

Buds n Tech IT Solutions: Top-Notch Web Services in Noida

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud

1. Tapjoy & OpenStack Delivering Billions of Requests Daily Wes Jossey Head of Operations @Tapjoy

2. Tapjoy ● Global App-Tech Startup ● We Power For Mobile Developers: ○ Monetization ○ Analytics ○ User Acquisition ○ User Retention ● 450M+ Monthly Users Across 270k+ Apps ● Worldwide Presence

3. Technical Details ● Early AWS Adopter. ● Grew Predominantly on AWS. ● Over 1,100 AWS VMs Daily (10/2014) ● Active Regions in Asia, Europe, N.A. ● Over One Trillion Requests Handled Annually

4. Tech Philosophy ● Compute (EC2 & Nova) Driven Company ○ Operate Your Own Infrastructure ■ But Not Necessarily Built-From-Scratch ○ Zero Heart-Attack Nodes ■ All Nodes Are Ephemeral ■ Data is Always Distributed ■ Failure is Always Tolerated ■ Misbehaving Instances Are Terminated Quickly

5. Services We Use ● SQS ○ Simple, Inexpensive, Durable. ○ Currently Building New Internal System Influenced by SQS, but with Different Guarantees ○ No Lock-In (See https://github.com/Tapjoy/chore) ● RDS ○ No Lock in. Simple. Easy. ● Cloudwatch (but also statsd)

6. Services We Use Cont. ● ELB ○ SSL Termination Only. Routing Handled Elsewhere. ● Auto-Scaling ○ Traffic can fluctuate 30% peak to valley ● S3 ○ Where we store ALL the things ○ Still price competitive for what it provides. No plans to leave as of today.

7. Use Compute Everywhere ● Every Dev Has Access to Either AWS or Tapjoy-1 (Tapjoy’s OpenStack Deployment) ● Simulate Changes Against Useful Data ● Test Algorithms on Large Hadoop Clusters ● Practice for Failure With Access to Real Services (not mock endpoints)

8. Going Hybrid ● We Spend in the Millions on AWS ● Picked Data-Science Infrastructure because of Portability, and Ability to Leverage More Nodes ● Lower Risk than Tier-1 Production Services ● Wanted a Partner to Maintain OpenStack like Amazon ‘Maintains’ AWS ● We Want to Operate Apps

9. OpenStack Timeline

10. Vendors (It Matters) ● Metacloud ○ Verified our Design ○ Deployed Openstack ○ Provisioned Network ○ Allowed Us to Focus on Business Applications ● Equinix ○ Cooling & Power Design ○ Remote Hands ○ Went Above and Beyond on Numerous Occasions

11. Vendors: Full List ● Metacloud ● Equinix ● Quanta ● Cumulus ● Level3 ● Newegg

12. Challenges ● Hardware Delays Killed Our Timelines ○ Blew through our contingency windows. ○ Hurt our budgets. ○ Delayed subsequent purchases ● Setting Up IP Transit Can Be Slow ● No Physical Presence in DC ○ Also a Pro ● No Internal Previous Success Story… So Lots of Skepticism

13. The Not So Glamorous Job ● Negotiations Can Be Exhausting ● If You’re An Engineer, the Turn Around Time Can Be Frustrating ● You Probably Need a Gantt Chart ● There’s Nothing Agile About Writing a Big Check

14. Tapjoy-1: Data Nodes 348 ‘Data’ All Purpose Nodes ● Quanta S910-X31E: 12 Node Configuration ● Per Node ○ Intel 1265Lv3 @ 2.5GHz ○ 4x1TB 7200RPM ○ 32GB RAM ○ Dual 1Gig NIC ● ‘Recyclable’ for Other Tasks if we Evolve

15. Tapjoy-1: Management Nodes 12 ‘Management’ Nodes ● Quanta S180: 4 Node Configuration ● Per Node ○ Intel 2650v2 x2 @2.60GHz ○ 128GB RAM ○ 6x480GB SSD ○ Dual 10Gig NIC

16. Glamor Shot

17. Same Price, Different Outcome

18. Diagrams!

19. High-Level Request Flow Architecture

20. Detailed Flow

21. Data Pipeline Tapjoy-1

22. Plan For Failure ● Hardware ○ I’m Not Saying You Shouldn’t Use CEPH… ■ But You’ll Notice it’s Absent Here ● Service Boundaries ○ Have Hardware & Software Contingencies ■ Backup Links ■ Temporary Cache(s) ○ Actually Test Failure in Production

23. Info ● Twitter! @dustywes ● Email: wes@tapjoy.com

Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud

Similar to Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud (20)

Recently uploaded

Recently uploaded (20)

Tapjoy Delivers Billions of Requests Daily with OpenStack Hybrid Cloud