From Zero to Lots - ScaleCamp UK 2009


Published on

From Zero to Lots
ScaleCamp UK 2009

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • - WebKit + APIs + add-ons for Nokia
  • - Without giving away any real numbers…
  • Services group is “organizationally imature” Large shift in organizational thinking to get to a real Agile organization
  • Core functionality built in two teamsFuture functionality built in two other teamsAll distributed components, no code sharing (except minor libraries), no integration between “now” and “future”
  • - Almost went live with MySQL cluster, but proved to be unstable and unwarranted, simplified to master-master
  • Legacy deployment mechanism- No monitoring!
  • Prime Place went live yesterday2 hr deployment processNeed more automation in QA and production
  • - Hyperic – moan, vendor lock in, expensive, blah – but gets the job done for us and very quickly, offers all features we need including JMX cluster management, no time for anything else right now
  • - Decouple through caching and proxying to deal with emergency outages and traffic spikes, without scaling out or giving time to get hardware
  • - Frequent production releases hard with lots of devices to test – need some support from outside of the team which slows things down a lot- Caching: Prove it before you use it- Sticking with MySQL due to support contracts, existing expertise, etc. – until it becomes an issue
  • Build teams, don’t throw them together It takes a lot of care and attention to scale Scrum teams Baby steps – tackle agile in the team, then promote good ideas and process to other teams, groups and out to the organization Promote the goodness from within …then automate some more Still struggling with automation, problem sometimes is teams wait for solutions instead of creating/proposing themselves Patience: some people see all of the problems and flip out, want to give up, complain a lot – take it in stride and chip away where you can
  • From Zero to Lots - ScaleCamp UK 2009

    1. 1. From Zero to LOTS<br />ScaleCamp UK<br />Josh Devins, Software Architect<br />
    2. 2. Who are we?<br />Nokia<br />Devices (duh)<br />Services<br />Nokia Maps (Berlin)<br />Device (native and WebKit-based clients)<br />Web<br />Map & Explore group<br />Place registration and management<br />Place discovery<br />
    3. 3.
    4. 4.
    5. 5. Overall growth<br />
    6. 6. The beginning<br />Small group<br />New services division of Nokia<br />Big ambition<br />Big company<br />Lots of stuff to do<br />Early problems<br />No existing traffic to study<br />No idea how popular services will be<br />Lots of pressure to assume huge traffic<br />
    7. 7. From 0 to N-1<br />200% increase in number of teams and team size<br />Started transition from “chaos” to Scrum<br />Initial launch of place services summer 2009<br />Strict focus on basic feature set<br />Core dataset<br />Search<br />Ratings<br />Start simple but know where you need to get to<br />~6.3M places<br />Web only<br />
    8. 8. Iteration N-1 choices<br />Two main teams<br />core competencies leveraged<br />EJB 3.0 + JBoss, Spring + Tomcat<br />Support contracts in place<br />JBoss – JBoss AS, JBoss Messaging<br />MySQL – cluster, then InnoDB<br />Existing operations group<br />Existing deployment mechanism<br />Static, read-only PXE Linux image<br />Used to deploying every couple months only<br />
    9. 9. N-1 technology stack<br />Client<br />Firefoxplugin<br />Server<br />Java, Maven (Nexus), CI (Hudson)<br />RESTful aggregated services<br />EJB 3.0 + JBoss, Spring + Tomcat<br />JPA, Hibernate<br />JBoss Messaging<br />MySQL (Master-Master)<br />Apache 2<br />Testing<br />JUnit, soapUI, JMeter<br />Operations<br />PXE Linux based server images (prod)<br />Debian<br />Nagios<br />
    10. 10. From N-1 to N<br />Today-ish<br />50% increase in number of teams and team size<br />120% increase in traffic<br />120% increase in number of places<br />Focus on more community involvement and enhancing place metadata<br />Create a place<br />Prime Place (business owner content)<br />Additional place metadata<br />~14M places<br />Web and N900 devices<br />
    11. 11. Iteration N choices<br />Rapid development and release<br />Spring + Tomcat everywhere<br />Common configuration mechanism<br />Common logging infrastructure/mechanism<br />Standardized file system layout on server<br />Automated static analysis with Sonar<br />Slack in resources not matching growth, requires automation<br />Built out replica QA environment with own team<br />Puppet + Webistrano<br />Hyperic monitoring ($)<br />
    12. 12. N technology stack<br />Client<br />Plugin not required (although enhances experience)<br />JS fameworks: Moo Tools<br />Server<br />Sonar<br />Spring + Tomcat (standardized)<br />Grails + Tomcat (administration)<br />RESTful APIs (external)<br />2-legged OAuth<br />Nokia CDN<br />Testing<br />Grinder, Selenium (some FitNesse)<br />Replicated QA environment<br />Operations<br />Unchanged (prod)<br />Puppet, Debian packages, Webistrano (QA)<br />Hyperic (QA) and Nagios (prod)<br />
    13. 13. From N to N+1<br />Planned for summer 2010<br />10% increase in team size (planned)<br />200% increase in traffic (expected)<br />100% increase in number of places<br />Scalability, reliability and robustness<br />Limited new feature set<br />It’s a secret…shhhhh…<br />Additional Navteq content<br />Additional premium content<br />~30M places<br />Web and N900, S60 devices<br />
    14. 14. Iteration N+1 choices<br />Scale and scale fast<br />Caching (HTTP/app? TBD – pending load testing)<br />Async business processes<br />Decouple/isolate persistence layers for protection, performance<br />Reconciliation/cleanup jobs<br />Learning<br />Hadoop data warehouse<br />Trending and tracking<br />Continued slack in operations resources<br />Push automation developed in QA environment to production processes<br />Kickstart, Puppet, RPMs, yum<br />Hyperic monitoring (prod)<br />
    15. 15. N+1 technology stack<br />Client<br />JS frameworks: combining the “good parts” from Moo Tools, Dojo jQuery<br />SDK for Maemo devices<br />Server<br />Varnish HTTP “accelerator” and/or app caching<br />ActiveMQ (RabbitMQ, Atom feeds, other?)<br />MySQL (Master-Master + N-Slaves)<br />Operations<br />Kickstart, Puppet, RPMs, yum mirrors<br />CentOS<br />Hyperic (QA, prod)<br />
    16. 16. The future<br />Move out of the database<br />Search already based on Lucene, still DB backed results (good NoSQL candidate)<br />Complex place matching and de-duplication algorithms will bottom out<br />Proxying and caching<br />Pragmatic approach: only where needed and where measured<br />Memcached, ehcache + Terracotta, JBossTreeCache, ehcache L2 cache? Depends…<br />Protect ourselves against persistence layer failures and spikes in traffic<br />Multi-homed, co-location, worldwide application distribution<br />Continuity during outages, lower latency, legal (China)<br />Master/slave, master/master, Paxos?<br />Application robustness<br />Robustness patterns (Release It!)<br />Partial failure/outage modes<br />Failure auto-detection and recovery (in the application)<br />NoSQL<br />Pragmatic approach: likely to stick with MySQL until it falls over<br />Looking only at very special cases for NoSQL, k/vstores (like Search results)<br />
    17. 17. A few lessons learned (so far)<br />Consider possible sharding strategies and implications early<br />Semi-opaque IDs<br />End-to-end continuous integration from day one<br />No matter how many components are involved, how hard it may seem<br />Scaling Scrum is really hard!<br />Self organization works when you have great people<br />Ensure tools and support are in place to guide them from day one (static analysis, strong mentors, etc.)<br />Build truly cross-functional teams<br />Promote Agile everything from the inside out (your team, group, division, org)<br />Automate, automate, automate<br />Don’t be fooled by frameworks<br />Shipping quality production software requires in-depth knowledge of the frameworks you use<br />Be humble – known when you need help<br />Find world class support and use it<br />Building an application with all of the *ilities:<br />Takes time, patience, expertise and flexibility<br />Requires the entire team, group, division and organization<br />
    18. 18. Thanks!<br />Questions or comments?<br /><br /> available)<br />@joshdevins<br />We’re hiring!<br />