AWS Meetup - Nordstrom Data Lab and the AWS Cloud


Published on

The Nordstrom Data Lab is building out an API that powers product recommendations for our customer online and beyond. Recommendo, our flagship product, was built from the ground up using Node.js and AWS in a little over three months. Since launch in November 2013 we've served up over three billion recommendations and survived Black Friday and Cyber Monday without breaking a sweat. We'll be sharing our learnings for building and operating a high traffic API on the AWS platform as a service focusing on Node.js, Elastic Beanstalk, and DynamoDB. Additionally we'll discuss some of the cultural challenges and opportunities presented when adopting the public cloud at a large corporate IT organization. In short, we believe there are tremendous advantages to be had for enterprises willing to make the leap to the cloud.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The ease of deployment is addictive.. Just a git push and you’re doneDAVID
  • DAVIDMany think that a similar revolution is about to happen for the enterprise.
  • DAVIDOn-Premise – large crew with many specialists, lots of infrastructure to support, lots of regulations, redundancy, energy required to prep and depart from the port, no cloud powerIaaS – Less power/resources required, smaller crew. Still complicated to sail though, requires some specialization. PaaS – Smallest crew yet, highly nimble, 2-3 people can just hoist the sails and get out to sea. Everyone can operate all aspects of the boat. Also means they are alone out there. Team needs to be self sufficient if something goes wrong.Point is not that we need fewer people overall, rather we need a whole fleet of nimble PaaS vessels. Shift resources towards the point of customer impact. Offload non-differentiating commodity infrastructure.
  • DAVIDThanks to cloud and code libraries (especially open-source), the additional incremental effort required for industrial strength is shrinking.Logging – lots of great open source libsMonitoring – Cloud Watch, New RelicRedundancy – Elastic Load BalancerScalability – Auto-Scaling groupsHigh Availability – Dynamo replicationDeployment Automation – git pushSecurity – IAM, VPC, Firewalls
  • AWS Meetup - Nordstrom Data Lab and the AWS Cloud

    1. 1. Jason Wilson & David Von Lehman PRESENTING AWS and the Nordstrom Data Lab
    2. 2. Recommendo Overview • REST-ful product recommendations API • Live on in November • Service emails live in January • Lives in the AWS cloud – Elastic Beanstalk, DynamoDB, node.js • 3rd party rec vendors don’t tap into what is unique about Nordstrom or fashion
    3. 3. By the Numbers • Over 4 billion recommendations served • >3 million API hits per day • 105 days between first commit and go-live (Aug 6 and Nov 19 respectively) • 5 servers with auto-scaling to 20 (turns out we don’t need them) • 90ms average request latency
    4. 4. 50/50 test against incumbent vendor
    5. 5. How We Built It • Continuous integration and deployment from the first week • 90+ percent code coverage • Fewer moving parts == less to monitor, fewer ways for things to go wrong • Fully PaaS based to minimize sys admin responsibilities • How can we support this ourselves without carrying pagers?
    6. 6. DynamoDB • Fully managed NoSQL database-as-a-service • Web API with SDK support for Python, Ruby, node.js, .NET, and Java • High performance queries, backed by SSD • Maintains predictable performance for data at any size through horizontal scale out • Auto replication across 3 availability zones • Need to understand data access patterns up front • Pay for only what you use/need – both storage and R/W throughput
    7. 7. • JavaScript on the server atop the Google V8 engine • Asynchronous event loop makes it ideal for real-time data intensive applications • Vibrant open-source community around excellent npm package manager (50K+ packages) • Seeing increased adoption in enterprises including Wal-Mart, LinkedIn, PayPal, Dow Jones, Microsoft, New York Times
    8. 8. JavaScript – Learn to Love It • No type checking, don’t find errors until runtime • Not classical OO • var keyword • Callback hell • Server debugging too hard • But wait.. • Chrome and V8 • Dynamic can be your friend • npm! • express, async, mocha
    9. 9. AWS Components • EC2 – Provides web-scale computing as a service. • ELB – elastic load balancer. Routes incoming traffic to ec2 instances, scales up to meet demand. • Auto-scaling group – a logical collection of EC2 instances behind an ELB
    10. 10. AWS Components
    11. 11. Elastic Beanstalk • AWS PaaS – lightweight abstraction layer atop EC2/ELB with no additional costs • More transparent than Azure or Heroku • Supports Java, .NET, Python, Node.js, PHP, and Ruby • git push deployment • Auto-scaling group with custom triggers and auto applied config • Possible to configure the AMI including yum packages, environment variables, and more • Supports custom AMIs • Automated health checks
    12. 12. Continuous Deployment git push to dev branch Jenkins CI unit tests git push to EB git pull dev git checkout master git merge dev git push master Jenkins CI unit tests git push to EB (prod) Development Production
    13. 13. Performance testing • Initial performance was poor. • Disable DNS caching when load testing against ELB. • Pre-warm ELB for higher upfront throughput • jmeter-ec2, bees with machine guns
    14. 14. Early Perf results – YIKES!Transactions per second Response time (seconds)
    15. 15. Performance tuning • New relic, Nodetime – Real-time performance monitoring of node runtime • node-mem-watch – Evented inspection of heap, gc events, leak events, and heap diffing • ssh into instances
    16. 16. Real Performance • Pleasantly surprised  • Average latency ~90ms • Dynamo response times <10ms • Handful of auto-scaling up and back events • One outage due to bad exception handling
    17. 17. 400% 64%
    18. 18. DynamoDB
    19. 19. Lessons Learned / Pitfalls • True zero downtime deployment is difficult to achieve • Thoroughly explore the Elastic Beanstalk configuration options • Catch those errors – a rogue unhandled exception can bring it all down • Health checks that actually do something • Out of the box monitoring is pretty good
    20. 20. Harness the Cloud On-Premise IaaS PaaS % time infrastructure experience
    21. 21. Logging Monitoring Redundancy Deployment Automation High- Availability Scalability Iterative Development Build to Experiment Evolutionary Architecture Change Tolerant Frequent Releases Small Teams Agility vs. Industrial Strength Security
    22. 22. PaaS Venn Diagram Robust Systems Rapid Delivery Platypus as a Service
    23. 23. Recommendo 2.0 • Sku based recommendations – size! • Truly personalized recs based on individual browse and purchase history DynamoDB Batch Recs Real- Time Refiner y ScorerIngester Redis Streams
    24. 24. Additional AWS Services • Elasticache and Redis • Elastic Beanstalk worker tiers • SQS • S3
    25. 25. Wrap-Up • Recommendo – initial success, now building upon what we have learned • Node.js + DynamoDB + Elastic Beanstalk is a winning combination • Possible to out-perform an incumbent vendor solution in a competitive differentiating capability • Cloud and PaaS enable small teams to move quick and deliver solid production caliber systems • Incremental cost of “gold plating” steadily shrinking • Your company benefits when percent of resources devoted to core competency is maximized
    26. 26. Thank you • Questions / comments? • @davidvlsea •