AWS Start-Up Tour 2009 / ShareThis


Published on

ShareThis, AWS Start-Up Tour 2009, Sunnyvale

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AWS Start-Up Tour 2009 / ShareThis

  1. 1. ShareThis on AWS Paco Nathan, Data Insights AWS Start-Up Tour 2009-06-16
  2. 2. What Does ShareThis Do? • “Make it simple to share any online content” • Social content sharing platform • ESPN, FOX, CS Monitor, HuffPost, CBS Marketwatch, Wired, TechCrunch, ThinkGeek, etc. • When a news story goes viral on a major publisher, our sharing services must scale-out to keep pace AWS Start-Up Tour 2009-06-16
  3. 3. AWS Start-Up Tour 2009-06-16
  4. 4. Why Our Company Uses AWS • >10^6 publishers, >10^9 users, >10^10 urls • Early stage start-up, < 25 people, “wearing lots of hats”, ultra fast-paced R&D • Spikes in popular stories impose demands throughout the architecture: API services, loggers, DW, BI, etc. • How can this level of service be built 100% in the cloud? AWS Start-Up Tour 2009-06-16
  5. 5. AWS Start-Up Tour 2009-06-16
  6. 6. System Architecture • Each service designed for cost-effective, horizontal scale-out • API served by cluster of LAMP stack + cluster of NginX • AsterData: nCluster infrastructure “hub-and-spoke” pattern • Cascading: abstraction layer for tying together components • Batch jobs on Elastic MapReduce, AsterData SQL/MR • SQS, EBS, SimpleDB, MTurk, plus other AWS services AWS Start-Up Tour 2009-06-16
  7. 7. AWS Start-Up Tour 2009-06-16
  8. 8. Key Learnings • Capability to scale-out horizontally without having to recode, rebuild, etc. — add new EC2 nodes to clusters • Authoritative data + backups in S3, great approach for DR • Wide range of use cases implemented: widget API, log clean-up, vertical search, business intelligence, etc. • Developers launch their own sandbox instances — makes dev/test/debug cycles more efficient • Staff enabled to “wear even more hats” with less risk AWS Start-Up Tour 2009-06-16
  9. 9. Cascading + Elastic MapReduce AWS Start-Up Tour 2009-06-16
  10. 10. Cascading + Elastic MapReduce • “Syntax is for humans, APIs are for software” • Defines apps as set operations applied to data flows • Engineers & data scientists don’t think in terms of MapReduce primitives, key/value pairs, etc. • Integrates Hadoop API + other APIs (S3, SQS, JDBC) • Expresses end-points as Java design patterns, compiled code — not just a scramble of scripts AWS Start-Up Tour 2009-06-16
  11. 11. Cascading + Elastic MapReduce • Highly scalable, fault-tolerate framework for batch jobs • Dramatically reduced need for Ops overhead • Excellent command line tools make the dev/test/debug cycle very efficient with “Big Data” • Highly expert staff, very responsive and helpful in forums • Cascading example code in developer resources: “LogAnalyzer for CloudFront” and “Multitool” AWS Start-Up Tour 2009-06-16
  12. 12. Hadoop Book / Case Study ShareThis case study, "Cascading" by Chris K Wensel, in… AWS Start-Up Tour 2009-06-16
  13. 13. Contacts @pacoid on Twitter AWS Start-Up Tour 2009-06-16