Rebuilding Web Tracking
Infrastructure for Scale
Stephen Oakley
Principal Engineer
Marketo
What is Marketo?
Page 3
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
What is Web Tracking at Marketo?
• Ingest web page visits and clicks on customer’s website
• Trigger campaigns in response to web activity
• Trigger real-time personalization of web experience
• Provide lead level analytics for known leads
• Provide aggregate analytics for all lead activity
• Typically known leads < 10 % of all traffic
Page 4
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Legacy Web Tracking Infrastructure
Page 5
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Legacy Web Tracking Infrastructure
Page 6
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Legacy Problems
• Throughput limitations – 2 million activities per day
• Processing delays can be on the order of hours
• Large customers cause web server brownouts
• Web reporting does not scale
• Fixed-sized clusters prohibit horizontal scaling
• Brittle infrastructure prevents feature development
The Vision
Page 8
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Orion Initiative
• Increase scale to support IoT for Marketers
• Support billions of marketing activities each day
• Trigger on activities in near real time (< 2 minute @ 99th %)
• Reduce operational costs
• Improve multitenancy and QoS
Requirements
Page 10
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Business Requirements
• 200 MM activities per customer per day
• Near real-time web activity processing (SLA of < 1
minute lag)
• Improve cost efficiency
• Improve flexibility for feature enhancements
Page 11
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Technical Requirements
• Multitenancy support with brownout protections
• Infrastructure must scale horizontally
• Decouple web processing from downstream processing
• Anonymous leads should cost next to nothing to track
Architecture & Design
Page 13
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Page 14
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Page 15
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Why Hbase + Phoenix?
• Horizontally scalable
• Leverages the Hadoop cluster for storage and scaling
• Provides secondary indices for query patterns through
Phoenix
• Natural integration with JDBC and Spark JDBC RDDs
Page 16
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Page 17
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Marketo Lambda Architecture
Spark Streaming
Consumers
Campaign Triggers
Solr Indexing
Solr
Spark Streaming Indexer
Ingestion Processor
Scala/Tomcat
HBase
Kafka
CRM Sync
Partner APIs
Other Marketing
Activities
Web Activity
RTP Activity
Mobile Activity
Marketo UI
Campaign Detail
Lead Detail
Other Clients
CRM Sync
Revenue Cycle Analylitcs
APIs
Email Report Loader
Web Activity Processor
Page 18
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Why Spark Streaming?
• Micro-batching provides sink-side efficiencies
• This is especially important with MySQL touchpoints
• Great integration with Kafka
• No strict real-time processing requirements
• Great community and industry adoption
Page 19
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Multitenancy
• One topic per customer (sized by volume)
• Traffic storms are isolated to a single customer
• Fairness/throttling is easy to control
• Spark Streaming job consumes from many topics
• Allows us to turn a customer off under error conditions
• See “Elastic Streaming” by Neelesh Shastry –
Spark Summit
Page 20
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Making Spark Streaming Performant
• Coalesce small partitions for the same customer
• Aggressive caching of metadata (mostly from MySQL)
• Heavily leverage Scala future composition for parallelism
• Persist RDDs that are used for multiple outputs
• e.g. write to Kafka and Activity Service
Page 21
Marketo Proprietary and Confidential | © Marketo, Inc.
10/31/2016
Making Anonymous Traffic Cheap
• High costs of web traffic in legacy system
• MySQL storage for all traffic
• Down streaming processing of all events (even anonymous)
• V2 only processes and stores known traffic in MySQL
• Defer triggering for anonymous data until promotion
• Rolled out to our highest volume customers
• Processing latencies < 30s (at 99.9th %)
• Allowed key customers to scale from ~2MM/day to > 20
MM/day
Impact and Results
• Mitigations of straggler effects on processing delays
• Adding sessionization for web reporting
• Scaling Kafka topics as customers increase volume
• Globally distributed ingestion for a single customer
Future Work
We’re Hiring!
Http://Marketo.Jobs
Q & A

Rebuilding Web Tracking Infrastructure for Scale

  • 1.
    Rebuilding Web Tracking Infrastructurefor Scale Stephen Oakley Principal Engineer Marketo
  • 2.
  • 3.
    Page 3 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 What is Web Tracking at Marketo? • Ingest web page visits and clicks on customer’s website • Trigger campaigns in response to web activity • Trigger real-time personalization of web experience • Provide lead level analytics for known leads • Provide aggregate analytics for all lead activity • Typically known leads < 10 % of all traffic
  • 4.
    Page 4 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Legacy Web Tracking Infrastructure
  • 5.
    Page 5 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Legacy Web Tracking Infrastructure
  • 6.
    Page 6 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Legacy Problems • Throughput limitations – 2 million activities per day • Processing delays can be on the order of hours • Large customers cause web server brownouts • Web reporting does not scale • Fixed-sized clusters prohibit horizontal scaling • Brittle infrastructure prevents feature development
  • 7.
  • 8.
    Page 8 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Orion Initiative • Increase scale to support IoT for Marketers • Support billions of marketing activities each day • Trigger on activities in near real time (< 2 minute @ 99th %) • Reduce operational costs • Improve multitenancy and QoS
  • 9.
  • 10.
    Page 10 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Business Requirements • 200 MM activities per customer per day • Near real-time web activity processing (SLA of < 1 minute lag) • Improve cost efficiency • Improve flexibility for feature enhancements
  • 11.
    Page 11 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Technical Requirements • Multitenancy support with brownout protections • Infrastructure must scale horizontally • Decouple web processing from downstream processing • Anonymous leads should cost next to nothing to track
  • 12.
  • 13.
    Page 13 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016
  • 14.
    Page 14 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016
  • 15.
    Page 15 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Why Hbase + Phoenix? • Horizontally scalable • Leverages the Hadoop cluster for storage and scaling • Provides secondary indices for query patterns through Phoenix • Natural integration with JDBC and Spark JDBC RDDs
  • 16.
    Page 16 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016
  • 17.
    Page 17 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Marketo Lambda Architecture Spark Streaming Consumers Campaign Triggers Solr Indexing Solr Spark Streaming Indexer Ingestion Processor Scala/Tomcat HBase Kafka CRM Sync Partner APIs Other Marketing Activities Web Activity RTP Activity Mobile Activity Marketo UI Campaign Detail Lead Detail Other Clients CRM Sync Revenue Cycle Analylitcs APIs Email Report Loader Web Activity Processor
  • 18.
    Page 18 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Why Spark Streaming? • Micro-batching provides sink-side efficiencies • This is especially important with MySQL touchpoints • Great integration with Kafka • No strict real-time processing requirements • Great community and industry adoption
  • 19.
    Page 19 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Multitenancy • One topic per customer (sized by volume) • Traffic storms are isolated to a single customer • Fairness/throttling is easy to control • Spark Streaming job consumes from many topics • Allows us to turn a customer off under error conditions • See “Elastic Streaming” by Neelesh Shastry – Spark Summit
  • 20.
    Page 20 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Making Spark Streaming Performant • Coalesce small partitions for the same customer • Aggressive caching of metadata (mostly from MySQL) • Heavily leverage Scala future composition for parallelism • Persist RDDs that are used for multiple outputs • e.g. write to Kafka and Activity Service
  • 21.
    Page 21 Marketo Proprietaryand Confidential | © Marketo, Inc. 10/31/2016 Making Anonymous Traffic Cheap • High costs of web traffic in legacy system • MySQL storage for all traffic • Down streaming processing of all events (even anonymous) • V2 only processes and stores known traffic in MySQL • Defer triggering for anonymous data until promotion
  • 22.
    • Rolled outto our highest volume customers • Processing latencies < 30s (at 99.9th %) • Allowed key customers to scale from ~2MM/day to > 20 MM/day Impact and Results
  • 23.
    • Mitigations ofstraggler effects on processing delays • Adding sessionization for web reporting • Scaling Kafka topics as customers increase volume • Globally distributed ingestion for a single customer Future Work
  • 24.

Editor's Notes

  • #3 Next phase was when we were ready to validate our newly built event ingestion system Marketo is a powerful Engagement Marketing Platform. There are several applications that make up the platform, such as ABM, Marketing analytics, predictive content, Digital Ads, and Marketing Automation. Marketing automation is what we are focusing on today. Marketing Automation enables the marketer to create, automate and measure marketing campaigns across channels. A simple example of an automated campaign or workflow is User visits your website and fills out a form Web tracking sees that they spent most their time looking at pages about spark streaming Automatically Send an email to the user to Invite them to a webinar on spark streaming services If they attend the webinar, register their interests in your crm and request a sales person contacts the user The campaigns can be complex and can reach out and track customers across channels like web, email, mobile, social
  • #4 Explain what a known vs anonymous lead is Known is targetable on other channels, anonymous is only web activity Speak to how the traffic patterns are heavily skewed toward anonymous given our customer base Talk about how anonymous converts to known. Aggregate analytics include company web report, landing page reports, etc.
  • #5 Speak to the pod Mention how there are many many pods
  • #6 An additional complication is the fact that the same two webservers also serve the mlm app, soap apis, and the landing pages
  • #8 Although the talk isn’t about the project…  we have a few slides up front to set the context around what we are working on If you have been near technology at all in the last couple of years you know that the world has become very connected.   The number of connected devices blows my mind.  It’s not just phones anymore…   Amazon dash buttons, coffee makers, propane tanks, garage doors.  These devices are sending 10’s of billions of activities and user interactions every day... Orion is our platfor Our marketing platform ingests the user interactions process them into relevant marketing touchpoints Its enables marketers to create marketing campaigns around these activities to build relationships with their customers Become the fabric for marketers Its been a great experience building this
  • #9 Here are a few of the requirements Near real time processing At least a 1 billion activities per customer per day. customer demands from increasing devices caused us to evaluate next get queueing and streaming... reduction in infrastructure COGS primarily from expensive enterprise class filers... reduction in people COGS by gained efficiency from reducing tech stack from using too many similar technologies ... Multitenant… of course Secure Customer isolation and improved resource management
  • #12 Arch requirement driven from biz requirement Improve utilization over the existing system Lots of customers in same infra, without starving Encryption from day 1 for safe data storage Aim for horz scalability Coming from standard 3 tier app Radically reduce processing latency Eliminate backlogs Brownout protection
  • #13 A few words about the architecture Main goal is to inject, process and store marketing events
  • #14 Details overview of Munchkin FE component Spray.io for MFE Frontend has the simple job of verifying subscription status, collecting metrics and persisting to kafka Use Avro to allow for schema evolution, strong typing and compact representation in topic Use Schema registry to allow the schema to be upgraded by the producer and them automatically picked up by the spark streaming component Use asynchronous API for kafka to allow high throughput.
  • #15 Details overview of LeadService component Spray.io for leadservice Hbase for Cookie and anonymous lead storage Salted table Key structure is subscription-cookie-leadid Secondary index for subscription-lead-createdat MySQL for known lead storage Masterdata for reverse ip information enrichments
  • #17 Overall view for the system Describe how there is a Kafka topic per subscription Spark streaming transforms the raw events into activities by Enriching with web page metadata from MySQL Lead and reverse IP enrichment from LeadService Persist activities to AS for storage and secondary processing (e.g. triggering and solr indexing) Push enriched web events to Kafka for the downstream Druid OLAP infrastructure.
  • #18 High level diagram of our event processor Enhanced Lambda Architecture Inbound activities written to Ingestion Processor Hbase and then Kafka High volume (e.g. web) activities First written to Kafka, then enriched Spark Streaming applications consume events from Kafka Solr Indexing Email Reports Campaign Processing HBase is used for simple historical queries, and is system of record
  • #19 While it is not “true” streaming, we exactly need this as an optimization
  • #21 Our multitenant Kafka framework coalesces small kafka paritions into large spark rdd partitions to improve batch utilization Several components of the event enrichment requires outbound RPC calls, using async clients and performing the calls in parallel and then composing the futures pipelines the computation and significantly improves throughput. Caching web assets and cookies for temporal locality Cache is > 60% of the executor memory Enriched events are written out to multiple sources and be selective about persisting RDDS prevents recomputing expensive transformations (multiple RPC calls or MySQL queries)
  • #22 Traditionally both anonymous and known data was treated equally in MLM. This is problematic because Anonymous volumes are usually 10-20x higher than known. Additionally there is very little intrinsic value in performing downstream processing on anonymous data since you cannot target anonymous leads for Campaigns. To improve this, in Munchkin V2 we only allow known traffic to flow to downstream processing. Anonymous data is passed for downstream processing when the lead converts to a known lead Via form fillout, api calls, etc.
  • #26 Reiterates my points on the last slide. I included in case you wanted to look at the slides later
  • #27 Give a quick overview of the activities architecture. Introduce Kafka in the presentation
  • #28 Spend more time on this – purple is our code , teal is spark standard # SubscriptionRegistry is using ZK # OffsetManager is a library, uses low level kafka consumer API # Provisioning framework – Sirius, a new subscription provisioned to registry via oozie