• Share
  • Email
  • Embed
  • Like
  • Private Content
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013
 

SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013

on

  • 3,013 views

SmugMug.com is a popular hosting and commerce platform for photo enthusiasts with hundreds of thousands of subscribers and millions of viewers. Learn now SmugMug uses Amazon DynamoDB to provide ...

SmugMug.com is a popular hosting and commerce platform for photo enthusiasts with hundreds of thousands of subscribers and millions of viewers. Learn now SmugMug uses Amazon DynamoDB to provide customers detailed information about millions of daily image and video views. Smugmug shares code and information about their stats stack, which includes an HTTP interface to Amazon DynamoDB and also interfaces with their internal PHP stack and other tools such as Memcached. Get a detailed picture of lessons learned and the methods SmugMug uses to create a system that is easy to use, reliable, and high performing.

Statistics

Views

Total Views
3,013
Views on SlideShare
2,156
Embed Views
857

Actions

Likes
2
Downloads
31
Comments
0

5 Embeds 857

http://d.hatena.ne.jp 819
http://feedly.com 33
http://www.feedspot.com 3
http://rss.ameba.jp 1
http://reader.aol.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013 SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013 Presentation Transcript

    • DAT204 - SmugMug: From MySQL to Amazon DynamoDB (and some of the tools we used to get there) Brad Clawsie, SmugMug.com November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
    • Welcome! • I'm Brad Clawsie, an engineer at SmugMug.com • SmugMug.com is a platform for hosting, sharing, and selling photos • Our target market is professional and enthusiast photographers
    • This Talk… • Isn't an exhaustive Amazon DynamoDB tutorial • Is about some observations made while migrating some features from MySQL to Amazon DynamoDB • Is an introduction to some of the tools that have helped us migrate to Amazon DynamoDB
    • Background • SmugMug.com started in 2003 • LAMP code base • A few machines/colocation → a lot of machines/colocations → Amazon Web Services • Hundreds of thousands of paying customers • Millions of viewers, billions of photos, petabytes of storage
    • Amazon DynamoDB in a Nutshell • • • • • • Tables → [Keys → Items] Items → [AttributeName → Attribute] Attribute → {Type:Value} Provisioned throughput NoSQL-database-as-a-service Create, Get, Put, Update, Delete, Query, Scan
    • MySQL at SmugMug
    • MySQL on Our Terms... • “SQL”, but not relational • We avoid joins, foreign keys, complex queries, views, etc. • Simplified our model so that caching was easier to implement • Used like a key (id) → values (row) system
    • MySQL on Our Terms... • Aggressive denormalization is common in many online applications • Upside – easy to migrate some of these tables and supporting code to “NoSQL” style database • Downside(?) – database does less, code does more
    • So Why Change? We're hitting roadblocks that can't be addressed by: • More/better hardware • More ops staff • Best practices
    • Notable Issue #1: “OFFLINE OPS” like ALTER TABLE • We used to have a fair number of read-only/sitemaintenance downtime to ALTER tables • As number of users grows, this always inconveniences someone • Introduces risk into the code • Other RDBMs are better about this
    • Temporary Relief... • Introduced the concept of treating a column as a JSON-like BLOB type for embedding varying/new data • Bought us some time and flexibility, and reduced the need for ALTER TABLE-related downtime • But MySQL wasn't intended to be an ID → BLOB system, and other issues remained
    • Notable Issue #2: Concurrency • MySQL can manifest some non-graceful degradation under heavy write load • We're already isolating non-essential tables to their own databases and denormalizing where we can...the problem persists
    • Notable Issue #3: Replication • A necessary headache, but in fairness MySQL is pretty good at it • Performance issues (single threaded etc.) • Makes it harder to reason about consistency in code • Big ops headache
    • Notable Issue #4: Ops Keeping all of this going requires an ops team... • People • Colocation • “Space” concerns – storage, network capacity, and all the hardware to meet anticipated capacity needs
    • Intangibles • • • • We have the resources to try out some new things We were already AWS fan boys Big users of Amazon S3 Recently moved out of colocations and into Amazon EC2 • Our ops staff has become AWS experts • So we would give an AWS database consideration
    • Immediate Observations • • • • • • Limited key structure Limited data types ACID-like on Amazon DynamoDB's terms Query/Scan operations not that interesting But, freedom from most space constraints Leaving the developer with primarily time constraints
    • First Steps • Start with a solved problem – stats/analytics • SmugMug's stats stack is a relatively simple data model: {“u”:”1”,”i”:”123”,”a”:”321”...} • We measure hits on the frontend and create lines of JSON with user, image, album, time, etc.
    • First Steps • Analytics needs reliable throughput – new data is always being generated • Space concerns (hardware, storage, replication) It was obvious that Amazon DynamoDB would free us from some space constraints. However, we were naive about Amazon DynamoDB's special time constraints.
    • Very Simple Tables • • • • A site key (user, image, album id) as HashKey A date as RangeKey The rest of the data Just a few tables • We'll have to manage removing data from them over time • Obvious: fewer tables → lower bill
    • Need for Tools • Even with our simple initial test bed, we saw the need for more tooling • We are huge users of memcache multi* functions • So we wanted to be able to have arbitrary-sized “batch” input requests • PHP doesn't do concurrency
    • So...a Proxy • A long-running proxy to take requests and manage concurrency for PHP • A proxy to allow us to cheat a little with sizing our requests* • Needed a tool that was geared toward building tools like proxies • Go fit the bill
    • A Little Risk • Writing tools for a young database in a young programming language • Resulted in two codebases we will share: • GoDynamo: equivalent of the AWS SDKs • BBPD: an HTTP proxy based on GoDynamo
    • Observation #1: On Amazon DynamoDB's Terms • Sensible key ↔ application mapping • Denormalization • No reliance on complex queries or other relational features • Many at-scale MySQL users are already using it in this way anyway
    • Observation #1: On Amazon DynamoDB's Terms • Avoid esoteric features • Don't force it • Amazon DynamoDB is not the only AWS database • Nice to have a “control” to use as a yardstick of success
    • Observation #2: Respect Throttling • Coming from MySQL, graceful degradation is an expected artifact of system analysis • But Amazon DynamoDB is a shared service using a simple WAN protocol • You either get a 2xx (success), 4xx, or 5xx (some kind of failure) • A binary distinction
    • Observation #2: Respect Throttling • Throttling is the failure state of a properlyformatted request • Throttling happens when the rate of growth of requests changes quickly (my observation) • Correlate your throttling to your provision
    • Observation #2: Respect Throttling • Typically, throttling happens well below the provisioning level • Don't reflexively increase your provisioning • Amazon DynamoDB behaves best when you optimize requests for space and time
    • Space Optimizations • • • • Compress data (reduce requests) Cache data (read locally when possible) Avoid clustering requests to tables/keys Use key/table structures if possible (often the application dictates this)
    • Time Optimizations • Reduce herding/spikes if possible • Queue requests to be processed as a controlled rate of flow elsewhere • Experiment with concurrency to achieve optimum reqs/sec
    • Don't Obsess Over Throttling • Some throttling is unavoidable • “Hot keys” are unavoidable • The service will get better about adapting to typical use • Experiment: flow, distribution, mix of requests, types of requests, etc. • Throttling is a strong warning
    • Observation #3: Develop with Real(ish) Data • “Test” data and “test” volume will fail you when you launch • Again, no graceful degradation • Your real data has its own flow and distribution • You must optimize for that • Once again, set up a control to validate observations
    • Observation #4: Live with the Limits • Don't try to recreate relational features in Amazon DynamoDB • Query/Scan are limited, be realistic • You can't really see behind the curtain • Feedback from the console is limited • Expect to iterate
    • Success? Recall our original MySQL gripes: (1) ALTER TABLE: kind of solved Amazon DynamoDB doesn't have full table schemas so to speak, so while we are able to add Attributes to an Item at will, we can only change a table's provisioning once created.
    • Success? (2) Replication: solved But opaque to using Amazon DynamoDB. (3) Concurrency: kind of solved Throttling introduces a new kind of concurrency issue, but at least it is limited to a single table.
    • Success? (4) Ops: mostly solved Ops doesn't have to babysit servers anymore, but they need to learn the peculiarities of Amazon DynamoDB and accept the limited value of the console and available body of knowledge.
    • Recap: What We Wrote • GoDynamo: like the AWS SDK, but in Go • BBPD: a proxy written on GoDynamo • See github.com/smugmug
    • Recap: Why a Proxy? • Allows us to integrate Amazon DynamoDB with PHP so concurrency can be put to use • Moves operations to an efficient runtime • Provides for simple debugging via curl and can check for well-formedness of requests locally • Hides details like renewing IAM credentials
    • Trivial Examples # Convenience endpoints available directly: $ curl -X POST -d '{"TableName":"user","Key":{"UserID":{"N":"1"}, "Date":{"N":"20131017"}}}' http://localhost:12333/GetItem # Or specify the endpoint in a header: $ curl -H 'X-Amz-Target: DynamoDB_20120810.GetItem' -X POST -d '{"TableName":"user","Key":{"UserID":{"N":"1"}, "Date":{"N":"20131017"}}}' http://localhost:12333/
    • BBPD is Just a Layer • GoDynamo is where the heavy lifting is done • Libraries for all endpoints • AWS Signature Version 4 support • IAM support (transparent and thread-safe)* • Other nonstandard goodies • Pro-concurrency, high performance • Enables some cool hacks
    • GoDynamo: Why Go? • Strong types, concurrency, Unicode, separate compilation, fast startup, low(ish) memory use, static binary as output of compiler (deploy → scp my_program) • Types ↔ JSON is easy, flexible, and idiomatic • Easy to learn and sell to your boss
    • Trivial Example // control our concurrent access to IAM creds in the background iam_ready_chan := make(chan bool) go conf_iam.GoIAM(iam_ready_chan) // try to get an Item from a table var get1 get_item.Request get1.TableName = “my-table” get1.Key = make(endpoint.Item) get1.Key[“myhashkey”] = endpoint.AttributeValue{s:”thishashkey”} get1.Key[“myrangekey”] = endpoint.AttributeValue{n:”1”} body,code,err := get1.EndpointReq() if err != nil || code != http.StatusOK { panic(“uh oh”) } else { fmt.Printf(“%vn”,body) }
    • AWS Identity and Access Management (IAM) • Included as a dependency is another package worth mentioning: goawsroles • An interface that describes how to handle IAM credentials • An implementation for text files • Suspends threads as credentials are being updated
    • Just the Beginning • Available at github.com/smugmug • Standard disclaimer – works for us, but YMMV! • Would love for you to use it and help create a community of contributors Thanks! :)
    • Please give us your feedback on this presentation DAT204 As a thank you, we will select prize winners daily for completed surveys!