From Zero to Lots - ScaleCamp UK 2009

From Zero to LOTS ScaleCamp UK Josh Devins, Software Architect

Who are we? Nokia Devices (duh) Services ovi.com Nokia Maps (Berlin) Device (native and WebKit-based clients) Web maps.ovi.com Map & Explore group Place registration and management Place discovery

The beginning Small group New services division of Nokia Big ambition Big company Lots of stuff to do Early problems No existing traffic to study No idea how popular services will be Lots of pressure to assume huge traffic

From 0 to N-1 200% increase in number of teams and team size Started transition from “chaos” to Scrum Initial launch of place services summer 2009 Strict focus on basic feature set Core dataset Search Ratings Start simple but know where you need to get to ~6.3M places Web only

Iteration N-1 choices Two main teams core competencies leveraged EJB 3.0 + JBoss, Spring + Tomcat Support contracts in place JBoss – JBoss AS, JBoss Messaging MySQL – cluster, then InnoDB Existing operations group Existing deployment mechanism Static, read-only PXE Linux image Used to deploying every couple months only

N-1 technology stack Client Firefoxplugin Server Java, Maven (Nexus), CI (Hudson) RESTful aggregated services EJB 3.0 + JBoss, Spring + Tomcat JPA, Hibernate JBoss Messaging MySQL (Master-Master) Apache 2 Testing JUnit, soapUI, JMeter Operations PXE Linux based server images (prod) Debian Nagios

From N-1 to N Today-ish 50% increase in number of teams and team size 120% increase in traffic 120% increase in number of places Focus on more community involvement and enhancing place metadata Create a place Prime Place (business owner content) Additional place metadata ~14M places Web and N900 devices

Iteration N choices Rapid development and release Spring + Tomcat everywhere Common configuration mechanism Common logging infrastructure/mechanism Standardized file system layout on server Automated static analysis with Sonar Slack in resources not matching growth, requires automation Built out replica QA environment with own team Puppet + Webistrano Hyperic monitoring ($)

N technology stack Client Plugin not required (although enhances experience) JS fameworks: Moo Tools Server Sonar Spring + Tomcat (standardized) Grails + Tomcat (administration) RESTful APIs (external) 2-legged OAuth Nokia CDN Testing Grinder, Selenium (some FitNesse) Replicated QA environment Operations Unchanged (prod) Puppet, Debian packages, Webistrano (QA) Hyperic (QA) and Nagios (prod)

From N to N+1 Planned for summer 2010 10% increase in team size (planned) 200% increase in traffic (expected) 100% increase in number of places Scalability, reliability and robustness Limited new feature set It’s a secret…shhhhh… Additional Navteq content Additional premium content ~30M places Web and N900, S60 devices

Iteration N+1 choices Scale and scale fast Caching (HTTP/app? TBD – pending load testing) Async business processes Decouple/isolate persistence layers for protection, performance Reconciliation/cleanup jobs Learning Hadoop data warehouse Trending and tracking Continued slack in operations resources Push automation developed in QA environment to production processes Kickstart, Puppet, RPMs, yum Hyperic monitoring (prod)

N+1 technology stack Client JS frameworks: combining the “good parts” from Moo Tools, Dojo jQuery SDK for Maemo devices Server Varnish HTTP “accelerator” and/or app caching ActiveMQ (RabbitMQ, Atom feeds, other?) MySQL (Master-Master + N-Slaves) Operations Kickstart, Puppet, RPMs, yum mirrors CentOS Hyperic (QA, prod)

The future Move out of the database Search already based on Lucene, still DB backed results (good NoSQL candidate) Complex place matching and de-duplication algorithms will bottom out Proxying and caching Pragmatic approach: only where needed and where measured Memcached, ehcache + Terracotta, JBossTreeCache, ehcache L2 cache? Depends… Protect ourselves against persistence layer failures and spikes in traffic Multi-homed, co-location, worldwide application distribution Continuity during outages, lower latency, legal (China) Master/slave, master/master, Paxos? Application robustness Robustness patterns (Release It!) Partial failure/outage modes Failure auto-detection and recovery (in the application) NoSQL Pragmatic approach: likely to stick with MySQL until it falls over Looking only at very special cases for NoSQL, k/vstores (like Search results)

A few lessons learned (so far) Consider possible sharding strategies and implications early Semi-opaque IDs End-to-end continuous integration from day one No matter how many components are involved, how hard it may seem Scaling Scrum is really hard! Self organization works when you have great people Ensure tools and support are in place to guide them from day one (static analysis, strong mentors, etc.) Build truly cross-functional teams Promote Agile everything from the inside out (your team, group, division, org) Automate, automate, automate Don’t be fooled by frameworks Shipping quality production software requires in-depth knowledge of the frameworks you use Be humble – known when you need help Find world class support and use it Building an application with all of the *ilities: Takes time, patience, expertise and flexibility Requires the entire team, group, division and organization

Thanks! Questions or comments? josh.devins@nokia.com www.joshdevins.net(slides available) @joshdevins We’re hiring!

From Zero to Lots - ScaleCamp UK 2009

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (7)

Similar to From Zero to Lots - ScaleCamp UK 2009

Similar to From Zero to Lots - ScaleCamp UK 2009 (20)

Recently uploaded

Recently uploaded (20)

From Zero to Lots - ScaleCamp UK 2009

Editor's Notes