Managing Performance Globally with MySQL
Upcoming SlideShare
Loading in...5
×
 

Managing Performance Globally with MySQL

on

  • 757 views

This is my presentation at MySQL Connect for 2013. I describe a large-scale Big Data system and how it was built.

This is my presentation at MySQL Connect for 2013. I describe a large-scale Big Data system and how it was built.

Statistics

Views

Total Views
757
Views on SlideShare
658
Embed Views
99

Actions

Likes
0
Downloads
3
Comments
0

4 Embeds 99

http://html5devconf.com 90
http://localhost 3
http://www.html5devconf.com 3
http://dev.liranuna.com 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The businesses that depend on us, depend on us to be fast!
  • It would not work to have a performance management system that is slow!
  • We need to scope this engagement to manageable size.We already have dozens or more people monitoring and managing our systems internally, but few looking at what the users actually experience. This come from Carey Milsap, formerlyVide-President of Performance at Oracle
  • UC 10 is a very important use case for us – we have many relationships with merchants, and performance is an important part of why they choose us.
  • Every Business Intelligence system has these three parts
  • This is one of the many ways we test our products to provide a better user experience
  • I was asked not to provide detailed figures on our data systems, so forgive me if everything is order-of-magnitude here.
  • Heresy! Normalized Dates and Times!
  • Architecture should also be unobtrusive, an enabler.Architecture makes the hard things buildable.This is a good metaphor to Paypal altogether – our role is to unobtrusively enable users to exchange money while getting out of the way. We are support, not front and center. Our job is to make merchants look good and work well.

Managing Performance Globally with MySQL Managing Performance Globally with MySQL Presentation Transcript

  • Managing Performance Globally with MySQL Daniel Austin, PayPal, Inc. MySQL Connect 2013 Sept 22nd, 2013
  • Why Are We Here? We needed a comprehensive system for performance management at PayPal Vision->Goals->Plan->Execution->Delighted User “Anytime Anywhere” implies a significant commitment to the user experience, especially performance and service reliability. So we designed a fast real-time analytics system for performance data using MySQL 5.1. And then we built it.
  • Overture: Architecture Principles 1. Design and build for scale 2. Only build to differentiate 3. Everything we use or create must have a managed lifecycle 4. Design with systemic qualities in mind 5. Adopt industry standards 3
  • What Do You Mean „Web Performance‟? • Performance is response time – In this case, we are scoping the discussion to include only end-user response time for PayPal activities • Only outside the PayPal system boundary – Inside, it‟s monitoring, complementary but different – We are concerned with real people not machines • For our purposes, we treat PayPal‟s systems as a black box
  • The Vision: 3 Big Ideas Performance engineering is a design-time activity. Bake It In Up Front We are focused on the experiences of end users of PayPal, anywhere, anyway, anytime. End2End Performance for End Users One Consistent View Establish one shared, consistent performance toolkit and testing methodology.
  • Who Needs Performance Data?
  • Architecture: Features • Model Driven Architecture – no code! • Data Driven – Real data products – Fast, efficient data model for HTTP • Up-to-date global dataset provides low MTTR • Flexible fast reporting for performance analytics
  • The Big Picture Data Collection Data Storage Data Reporting
  • The Big Picture
  • Part I – Data Collection
  • The Data Collection Footprint
  • End to End Testing
  • The API Testing Challenge
  • Data Collection Summary • Multiple sources for synthetic and RUM performance testing data • Large-scale dataset with very long (10 yrs+) retention time – Need to build for the ages • Requires some effort to design a flexible methodology when devices and networks are changing quickly
  • Part II – Data Storage
  • Advanced ETL With Talend • MODEL-DRIVEN = FAST DEVELOPMENT • LETS US DEVELOP COMPONENTS FAST • METADATA DRIVEN • MODEL IN, JAVA OUT
  • GLeaM Data Products • Level 0 – Raw data at measurement-level resolution – Field-level Syntactic & Semantic Validation – Level 1 – 3NF 5D Data Model – concrete aggregates while retaining record-level resolution – Level 2 – User-defined and derived measures – Time & Space-based aggregates – Longitudinal and bulk reporting A data product is a well-defined data set that has data types, a data dictionary, and validation criteria. It should be possible to rebuild the system from a functional viewpoint based entirely on the data product catalog.
  • Semantic v. Syntactic Validation 1. Syntactic Validation Step 2. Semantic Validation Step
  • GLeaM Data Storage • Modeling HTTP in SQL • MySQL 5.1, Master & multi-slave config • 3rd Normal Form, Codd compliance • Fast, efficient analytical data model for HTTP Sessions
  • 3NF Level 1 Data Model for HTTP • NO xrefs • 5D User Narrative Model • High levels of normalization are costly up front… • …but pay for themselves later when you are making queries!
  • GLeaM Data Model
  • Level 1: The Boss Battle!
  • Managing URLs • VARCHAR(4096)? • Split at path segment • We used a simple SHA(1) key to index secondary URL tables • We need a defined URI data type in MySQL!
  • Some Best Practices – URIs: Handle with care • Encode text strings in lexical order • Use sequential bitfields for searching – Integer arithmetic only – Combined fields for per-row consistency checks in every table – Don‟t skip the supporting jobs – sharding, rollover, logging – Don‟t trade ETL time for integrity risk!
  • Part III – Data Reporting
  • GLeaM Data Reporting • GLeaM is intended to be agnostic and flexible w.r.t reporting tools • We chose Tableau for dynamic analytics • We also use several enterprise-level reporting tools to produce aggregate reports
  • Tableau Features INTERACTIVE & FLEXIBLE EXCEL-LIKE SIMPLICITY WEB AND DESKTOP CLIENTS FAST PERFORMANCE
  • GLeaM Reports We designed initial reports for 3 sets of stakeholders: • High-level overviews for busy decision-makers Analytics • Diagnostic reports for operations teams to identify Operations Executives • Deep-dive analytical reports to identify opportunities for improvements
  • Global Performance Management 29
  • What We Learned • Paying attention to design patterns pays off • MySQL rewards detailed optimization • Trade-offs around normalization can lead to 10x or even 100x query time reduction • Sharding remains an issue • We believe we can easily achieve petabyte scales with additional slaves 30
  • CODA: THE LAST ARCHITECTURE PRINCIPLE SHIBUI SIMPLE ELEGANT BALANCED …A PLAYER IS SAID TO BE SHIBUI WHEN HE OR SHE MAKES NO SPECTACULAR PLAYS ON THE FIELD, BUT CONTRIBUTES TO THE TEAM IN AN UNOBTRUSIVE WAY.
  • Thank You! @daniel_b_austin Daniel Austin PayPal, Inc. MySQL Connect 2013 Sept 22nd, 2013