kaChing is an online platform that connects investors with outstanding investment managers. We are a technology driven company, which has adopted lean methodologies from the start, and has achieved a 5-minute commit-to-production cycle. Continuous deployment is a way of life and is integral to our engineering culture.
In this talk, we will present our system's architecture and discuss our service oriented platform dubbed kawala (in the process of being open sourced on http://code.google.com/p/kawala). We will describe the mechanics of an automated release from check-in to production: clean build with full regression testing in less than 3 minutes, packaging and deployment by automatically redirecting traffic using ZooKeeper for coordination, health checks and immune system to monitor the release. Finally, we will talk about planned evolutions and the next challenges we face in our infrastructure.
2. Twenty Deployments a Day Keeps the Nasty Bugs Away Pascal-Louis Perez, kaChing Group Inc.
3. “Connect investors with outstandinginvestment managers” Pascal-Louis Perez, kaChing Group Inc.
4. Release is a Marketing ConcernReducing Code Inventory is what Engineering Focuses On Pascal-Louis Perez, kaChing Group Inc.
5. Lean Startup at kaChing “Human institution designed to create something new under conditions of extreme uncertainty” – Eric Ries Methodical approach to eliminating waste No Quality Assurance engineers, no Operations team Empower everybody to drive their own projects Pascal-Louis Perez, kaChing Group Inc.
6. What is Continuous Deployment? Continuous, successful and repeatable methodology to deploying code Automates every steps of taking checked in code and making it run on production servers, in front of customers Pascal-Louis Perez, kaChing Group Inc.
7. True Story Investment managers calls, comments about unintuitive trading flow. Improvements are made, and deployed within the next 20 minutes. We call him back “What do you think of the improvement?” Pascal-Louis Perez, kaChing Group Inc.
8. Benefits of Continuous Deployment It allows quick iterations Obsoletes processes, e.g. “cutting a release” Reduces risk Everyone is aware of production No one throws code over the wall Exposes 24x7 operational requirements Trunk stable Pascal-Louis Perez, kaChing Group Inc.
9. ContinuousDeployment Immune System Making Continuous Deployment a Reality Pascal-Louis Perez, kaChing Group Inc. Culture Test Driven Development
11. Our architecture Service oriented system Vertical sharding Everything uses the same platform kawala Coordination using ZooKeeper Data interchange using JSON and Protobufs Pascal-Louis Perez, kaChing Group Inc.
12. Typical Stack Clustered services, multiple instances Replicated databases (e.g. MySQL or NoSQL) Caching (e.g. memcached) Denormalized data (e.g. Voldemort) Pascal-Louis Perez, kaChing Group Inc.
13. kawala Command pattern. A command is a query Dynamic Inversion of Control Queries produce a value Pascal-Louis Perez, kaChing Group Inc.
14. Sample Query Pascal-Louis Perez, kaChing Group Inc. class GetUser(Id<User> userId) { @Inject UserRepository repository; public User process() { return repository.load(userId); }}
15. What We do with Queries Pascal-Louis Perez, kaChing Group Inc. Persist Queries RPC Command Line Tools Serve BLOBS CPS Render Pages Push into MQ
19. Testing Philosophy Only automated testing matters If it isn’t tested, it isn’t finished or correct Write testable code Embrace abstractions Testing is cross functional, we all own quality
20. Benefits of TDD It allows quick iterations It empowers engineers to change anything, and as such helps in scaling the team It is more cost effective than debugging It obsoletes the need for functional QA It facilitates continuous refactoring, allowing the code to get better with age It attracts the right kind of engineers Pascal-Louis Perez, kaChing Group Inc.
21. Types of Testing Unit Testing - does the code work? Integration Testing - does the code work together? Regression Testing - learn from your mistakes Frontend Testing - a whole different ballgame
22. Defensive Testing Nightmare scenarios A common conversation at lunch: Alice: What would happen if X blew up? Bob: uh... the site would go down. Fix it, test for it
25. Process Everyone: writes code and tests, releases to production, adds monitoring Specialists help out when needed New hires push code the first day Problems, issues, errors, bugs, oversights...
26. Stability Cluster everything As little global state as possible Maintain global state through ZooKeeper Monitor everything
37. Monitoring Philosophy Prefer business metrics Monitor statistical deviations not absolute values Automatically annotate graphs
38. Monitoring Errors False negatives - errors of omission False positives - errors of implementation
39. End to End Monitoring Run Selenium on production Accessibility, Speed Ad-hoc Keynote - customized for our flows Must control data creation, analytics impact
41. Quarantining Isolate new releases Flexible partitioning of requests Gradually shift load to fresh services Pascal-Louis Perez, kaChing Group Inc.
42. Describing Infrastructure Many moving parts nagios, collectd, backups, services, databases, … Consistency is key Adding new tools should be easy and thorough Standardize best practices Pascal-Louis Perez, kaChing Group Inc.
43. What we Covered Today Lean Startup and Continuous Deployment Anatomy of a Release Commit, Testing, Deployment, Monitoring Future Developments Quarantining, DSL for Infrastructure Pascal-Louis Perez, kaChing Group Inc.
44. References We’re Recruiting jobs@kaching.com kaChing’s blog http://eng.kaching.com kawalahttp://bit.ly/kawala Deployment Infrastructure http://eng.kaching.com/2010/05/deployment-infrastructure-for.html Extreme Testing http://bit.ly/9bOFaA Writing Testable Codehttp://googletesting.blogspot.com/2008/08/by-miko-hevery-so-you-decided-to.html ZooKeeperhttp://bit.ly/kc-zookeeper Eric Rieshttp://startuplessonslearned.com Pascal-Louis Perez, kaChing Group Inc.
Editor's Notes
Thanks to SDForum, thanks to LinkedIn for hostingToday, talk about applied lean startup and specifically about continuous deployment at kaChingDavid and I are presenting the efforts of a whole team.Tweet, share, shout, ask questionsBefore we start, let me ask you a few questions:How long from writing code to having it in production?How much time do spend on “releasing” every month?How long does deprecated code stay around?How often do you refactor code?New hire code to production?How long from product request to code in production, in front of alpha/beta testers?
Let me give a some context about what kaChing is.Our goal is to connect investors with outstanding investment managers.Schwab created Schawb One Source which is THE marketplace for mutual funds.We want to create the equivalent for individually managed accounts, essentially bringing wealth management to retail.On kaChing, transparency is the norm. You can know everything about the investment manager you are entrusting. His portfolio holdings, his past transaction history, his rational, philosophy. And we use all this data to objectively vet managers.In 6 months, we’ve attracted $11M and are growing very fast.We’re an SEC regulated company, en route to being FINRA regulated as a broker/dealer.
QA and operations engineer are typically less compensated than software engineers or architects. How can we be saving by having more expensive people do the work?Also explain what our view on QA is: two functions. Check the spec was translated to a working product correctly; Check that the product works. The first part is inherently human, the latter must be automated to eliminate waste.
CD’s benefits are twofold: business/lean startup and technology related.Continuous deployment is essential to driving engineering culture and enabling quick iteration cycles.Many processes which are typically seen in engineering organizing, cutting a release weekly for instance, are not needed anymore. Reduces a lot of waste, that is work not directly helping with the learning experience of creating something newWe are a financial company. We deal with people’s money. Not being able to communicate with our prime broker is a no no. Work in small batches, reduce the possibility of integration problems, and rollback as well as narrow down quickly if they ever occur.From a technology stand point, engineers are acutely aware of how things run. You are on the hook for bad code, no throwing over the wall to QA organization.Everybody needs to know about how a 24x7 site is ran.Trunk stable, we’ll talk about that in a minute
This is a very high level survey, the goal is to convey the necessary things to understand how our system works in the context of CD not how kaChing is built.JSON and Protobufs
-Scale the team: add features without adding engineers to support and maintain these features. This requires “production tests” a.k.a. monitoring.-Debugging: debugging live servers in a production environment!-At kaChing, we’ve never spent $1 on QA-Not all engineers are excited by a test-driven environment. Some are happy with “tests” taking “only” 5 hours to run.