Your SlideShare is downloading. ×
From Zero to Lots - ScaleCamp UK 2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

From Zero to Lots - ScaleCamp UK 2009

942
views

Published on

From Zero to Lots …

From Zero to Lots
ScaleCamp UK 2009
2009-12-04

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
942
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • - WebKit + APIs + add-ons for Nokia
  • - Without giving away any real numbers…
  • Services group is “organizationally imature” Large shift in organizational thinking to get to a real Agile organization
  • Core functionality built in two teamsFuture functionality built in two other teamsAll distributed components, no code sharing (except minor libraries), no integration between “now” and “future”
  • - Almost went live with MySQL cluster, but proved to be unstable and unwarranted, simplified to master-master
  • Legacy deployment mechanism- No monitoring!
  • Prime Place went live yesterday2 hr deployment processNeed more automation in QA and production
  • - Hyperic – moan, vendor lock in, expensive, blah – but gets the job done for us and very quickly, offers all features we need including JMX cluster management, no time for anything else right now
  • - Decouple through caching and proxying to deal with emergency outages and traffic spikes, without scaling out or giving time to get hardware
  • - Frequent production releases hard with lots of devices to test – need some support from outside of the team which slows things down a lot- Caching: Prove it before you use it- Sticking with MySQL due to support contracts, existing expertise, etc. – until it becomes an issue
  • Build teams, don’t throw them together It takes a lot of care and attention to scale Scrum teams Baby steps – tackle agile in the team, then promote good ideas and process to other teams, groups and out to the organization Promote the goodness from within …then automate some more Still struggling with automation, problem sometimes is teams wait for solutions instead of creating/proposing themselves Patience: some people see all of the problems and flip out, want to give up, complain a lot – take it in stride and chip away where you can
  • Transcript

    • 1. From Zero to LOTS
      ScaleCamp UK
      Josh Devins, Software Architect
    • 2. Who are we?
      Nokia
      Devices (duh)
      Services ovi.com
      Nokia Maps (Berlin)
      Device (native and WebKit-based clients)
      Web maps.ovi.com
      Map & Explore group
      Place registration and management
      Place discovery
    • 3.
    • 4.
    • 5. Overall growth
    • 6. The beginning
      Small group
      New services division of Nokia
      Big ambition
      Big company
      Lots of stuff to do
      Early problems
      No existing traffic to study
      No idea how popular services will be
      Lots of pressure to assume huge traffic
    • 7. From 0 to N-1
      200% increase in number of teams and team size
      Started transition from “chaos” to Scrum
      Initial launch of place services summer 2009
      Strict focus on basic feature set
      Core dataset
      Search
      Ratings
      Start simple but know where you need to get to
      ~6.3M places
      Web only
    • 8. Iteration N-1 choices
      Two main teams
      core competencies leveraged
      EJB 3.0 + JBoss, Spring + Tomcat
      Support contracts in place
      JBoss – JBoss AS, JBoss Messaging
      MySQL – cluster, then InnoDB
      Existing operations group
      Existing deployment mechanism
      Static, read-only PXE Linux image
      Used to deploying every couple months only
    • 9. N-1 technology stack
      Client
      Firefoxplugin
      Server
      Java, Maven (Nexus), CI (Hudson)
      RESTful aggregated services
      EJB 3.0 + JBoss, Spring + Tomcat
      JPA, Hibernate
      JBoss Messaging
      MySQL (Master-Master)
      Apache 2
      Testing
      JUnit, soapUI, JMeter
      Operations
      PXE Linux based server images (prod)
      Debian
      Nagios
    • 10. From N-1 to N
      Today-ish
      50% increase in number of teams and team size
      120% increase in traffic
      120% increase in number of places
      Focus on more community involvement and enhancing place metadata
      Create a place
      Prime Place (business owner content)
      Additional place metadata
      ~14M places
      Web and N900 devices
    • 11. Iteration N choices
      Rapid development and release
      Spring + Tomcat everywhere
      Common configuration mechanism
      Common logging infrastructure/mechanism
      Standardized file system layout on server
      Automated static analysis with Sonar
      Slack in resources not matching growth, requires automation
      Built out replica QA environment with own team
      Puppet + Webistrano
      Hyperic monitoring ($)
    • 12. N technology stack
      Client
      Plugin not required (although enhances experience)
      JS fameworks: Moo Tools
      Server
      Sonar
      Spring + Tomcat (standardized)
      Grails + Tomcat (administration)
      RESTful APIs (external)
      2-legged OAuth
      Nokia CDN
      Testing
      Grinder, Selenium (some FitNesse)
      Replicated QA environment
      Operations
      Unchanged (prod)
      Puppet, Debian packages, Webistrano (QA)
      Hyperic (QA) and Nagios (prod)
    • 13. From N to N+1
      Planned for summer 2010
      10% increase in team size (planned)
      200% increase in traffic (expected)
      100% increase in number of places
      Scalability, reliability and robustness
      Limited new feature set
      It’s a secret…shhhhh…
      Additional Navteq content
      Additional premium content
      ~30M places
      Web and N900, S60 devices
    • 14. Iteration N+1 choices
      Scale and scale fast
      Caching (HTTP/app? TBD – pending load testing)
      Async business processes
      Decouple/isolate persistence layers for protection, performance
      Reconciliation/cleanup jobs
      Learning
      Hadoop data warehouse
      Trending and tracking
      Continued slack in operations resources
      Push automation developed in QA environment to production processes
      Kickstart, Puppet, RPMs, yum
      Hyperic monitoring (prod)
    • 15. N+1 technology stack
      Client
      JS frameworks: combining the “good parts” from Moo Tools, Dojo jQuery
      SDK for Maemo devices
      Server
      Varnish HTTP “accelerator” and/or app caching
      ActiveMQ (RabbitMQ, Atom feeds, other?)
      MySQL (Master-Master + N-Slaves)
      Operations
      Kickstart, Puppet, RPMs, yum mirrors
      CentOS
      Hyperic (QA, prod)
    • 16. The future
      Move out of the database
      Search already based on Lucene, still DB backed results (good NoSQL candidate)
      Complex place matching and de-duplication algorithms will bottom out
      Proxying and caching
      Pragmatic approach: only where needed and where measured
      Memcached, ehcache + Terracotta, JBossTreeCache, ehcache L2 cache? Depends…
      Protect ourselves against persistence layer failures and spikes in traffic
      Multi-homed, co-location, worldwide application distribution
      Continuity during outages, lower latency, legal (China)
      Master/slave, master/master, Paxos?
      Application robustness
      Robustness patterns (Release It!)
      Partial failure/outage modes
      Failure auto-detection and recovery (in the application)
      NoSQL
      Pragmatic approach: likely to stick with MySQL until it falls over
      Looking only at very special cases for NoSQL, k/vstores (like Search results)
    • 17. A few lessons learned (so far)
      Consider possible sharding strategies and implications early
      Semi-opaque IDs
      End-to-end continuous integration from day one
      No matter how many components are involved, how hard it may seem
      Scaling Scrum is really hard!
      Self organization works when you have great people
      Ensure tools and support are in place to guide them from day one (static analysis, strong mentors, etc.)
      Build truly cross-functional teams
      Promote Agile everything from the inside out (your team, group, division, org)
      Automate, automate, automate
      Don’t be fooled by frameworks
      Shipping quality production software requires in-depth knowledge of the frameworks you use
      Be humble – known when you need help
      Find world class support and use it
      Building an application with all of the *ilities:
      Takes time, patience, expertise and flexibility
      Requires the entire team, group, division and organization
    • 18. Thanks!
      Questions or comments?
      josh.devins@nokia.com
      www.joshdevins.net(slides available)
      @joshdevins
      We’re hiring!

    ×