More Related Content


Feedback loops between tooling and culture

  1. Feedback loops between tooling and culture Chris Winters @cwinters
  2. Caveats * My opinions, not Turnitin’s (or anyone else’s)
 * Series of anecdotes rather than stories :-)
 * Set of people and resources at end, will post link on twitter later (@cwinters)
  3. Goals for today ❖ Culture is central to work ❖ You are part of the culture ❖ You can help change it ❖ Tooling can be one of your levers Central to our enjoyment or dislike of work
 Central to how productive we are individually or as a team Why focus on tooling? It’s something we tend to like iterating on, maybe because we’re the customer?
  4. Tooling What do you use
 to do your job? So what is tooling? It’s the answer to this question. Yes, it’s broad.
  5. Examples ❖ Editors/IDEs, keyboards, debuggers ❖ Test runners, build systems, traffic sniffers ❖ Team messaging, video conferencing, shared calendars ❖ Status reporting, ticketing systems, file sharing ❖ Whiteboards, post-it notes, screen recorders ❖ Headphones, video cameras, microphones I bet if I got five of you in a room you could come up with 20 more in five minutes. People in the sorts of work that we do use a lot of tools to do our jobs. It also emphasizes that our job is not writing code. It’s making products that help people in some aspect of their life or job.
  6. Feedback loop A system’s output
 amplifies or inhibits
 future system output Next definition Amplifies: positive Inhibits: negative When we talk about systems we inevitably have to talk about these. It’s not complicated, but if you’re not thinking about these sorts of things you’re probably considering things statically
  7. Examples ❖ Speaker (out) picked up by microphone (in) — classic! ❖ Higher temps melt more ice, which exposes darker ground, which absorbs more heat, which melts more ice ❖ Building a highway to alleviate congestion spurs more development along the highway, which leads to more cars, which leads to more congestion ❖ More predators means less prey, which means predators die from starvation, which means prey can bounce back More: * Drought kills plants, which limits water released by plants, which increases dust, which makes air even dryer * The last one shows both positive and negative, though which is which probably depends on whether you’re the eater or the eaten
  8. Examples - Software ❖ “Infinite” storage shaping an early web site of mine… ❖ Systems hard to test means that tests are hard to write… ❖ Releases are risky, we should do fewer releases… * Agile is built around feedback loops, preferably more frequent so the time between an activity and reflection on that activity is short * “ZARNETZKE!” (time when 1GB HD was ~$1k, and you got 20-50MB with your website) * …which means only the easiest tests get written, which means we rush to fix inevitable bugs, which are really hard to fix because the system is hard to test… The last one is a core loop that may have happened to some of you. You put software out in the world and something bad or embarrassing happens. (It happens to everyone) And the natural human tendency is the find the thing that changed, and do less of that. But unless your company is okay with you shipping fewer features the side-effect is that you’ll be putting more features into fewer releases. So every release that goes out is riskier because there are more things that can go wrong.
  9. Risk is scary! ❖ Answering angry customers sucks ❖ Natural to want to control and limit variables ❖ But what if those variables are what you do? This reaction happens for good reason! Control is literally the job of some people, and you can’t just ignore that.
  10. Break the cycle ❖ How to do smaller releases? ❖ …how to give everyone confidence? ❖ …how to make change sustainable? * It’s all well and good to say “we need less risky releases”, but how do you do that? * Your process is tooling, and it’s probably adapted to chunky releases * …full regression tests: necessary? * …change review board? * …separate teams doing handoff reviews? * This tooling has created a culture * …and culture is really hard to change
  11. Culture How do we work together?
  12. Examples ❖ What do you celebrate? What do you punish? ❖ Communicate via speaking or via document? ❖ Implicit AND explicit ❖ Intentional AND accidental
 ❖ “I know it when I see it” * What are some phrases you use to describe a company’s culture?
 Open, friendly, collaborative, passive aggressive, sales driven
 * Implicitly: example from Jeff Koenig accidentally dropping the production database * “I know it when I see it” from Justice Potter Stewart on obscenity case (Jacobellis v Ohio, about the airing of a Louis Malle film “The Lovers”) — also known as an Elephant Test (“hard to describe, but instantly recognizable when spotted”)
  13. Themes ❖ Agency: can I affect it? ❖ Independence: can I do it without asking? ❖ Re-use: can I make your thing part of my thing? * Back to caveats, a lot of the way I see this is driven by the way I see everything — people generally want to do a right thing, and part of our job when making tools and processes is to make doing a right thing as easy as possible. 
 * Going to go through two areas around this: Testing, and CI/CD
  14. Chrome plugin to show you a new piece of art every day Is this a tool? Y
 Does it help me do my job? N Take a drink
  15. 1. Testing ❖ One of the most powerful tools we have ❖ Culture deeply shapes how we test ❖ Culture deeply shapes the impact of testing ❖ Scrummerfall still reigns? * Powerful: Not only to ensure quality (which is our professional responsibility) but also so that we can focus on building useful features instead of scrambling to fix bugs with the pressure of a broken system limiting our vision. * Many people still do cycles of build-test-release, which means testing is “the only thing sitting between devs and customers” — that BETWEEN is a problem. It’s a hold-up if testing folks actually do the job we want them to do.
  16. Slightly controversial ❖ Why separate software engineers and quality engineers? ❖ Why don’t engineers test their own code? * Yay, not everyone separates like that, SDET (in MS, Amazon, elsewhere) seem to be at the same level money and status-wise * I think “you can’t see the problems with your own code” is bullshit. It’s absolutely true if you never do it, but like everything else the more you do it the better you get at it. This idea that quality engineers “think differently” about problems verges on the mystical — you can learn to do that just like you learned to program, or write, or play the guitar, or bake bread. * Your job is not to write code. Your job is to solve problems. And shit that doesn’t work doesn’t solve problems. * Can you imagine civic engineers leaving the testing of roads or bridges entirely to someone else?
 * I suspect this has happened because writing tests is generally seen as lower status. The assumption is that if you *can* write code you are. Therefore if you aren’t writing code you can’t. And writing code is high status. Could it be seen as asking a race car driver to take on a school bus route.
  17. People still matter ❖ Automation is not the end of testing ❖ Testing gives us confidence our systems will work ❖ …and that they’re right ❖ …we need people for that * “Right” systems because of UX; my conversion story
  18. Share the love ❖ How do we get more people to test?
  19. Basics ❖ How do we get more people to do anything?
  20. Make it easy
  21. Make it valued
  22. Testing microservices ❖ Microservice world: feature can span multiple front- end, edge, and back-end services ❖ Multiple teams, multiple features, all operating simultaneously ❖ Testing 😳
  23. Standard way ❖ Standard way: three production-like environments ❖ Dev: Test out your mostly-baked changes ❖ QA: Verify those changes ❖ Production: Let users test your changes
  24. Negative loops: one environment ❖ Dev and QA envs become bottlenecks ❖ Option: serial (coordinate who can deploy and when?) ❖ Option: hopes and prayers (do what you want) ❖ How do the services that many others depend on test themselves without breaking everyone else? ❖ Same feedback loop as “release less often” * same loop as “release less often” - larger changes, less certainty about what may be causing problem, less frequent iterations
  25. Positive loops: color environments ❖ Ready to test a feature? You get an environment! ❖ Reset state (Postgres, Redis) every time ❖ “Color” refers to their name in URL: gold, ruby, sage ❖ Create via Slack commands, 5-10 minute turnaround ❖ Technical info (quickly) * Naming shapes this * Technical info: * All services (~30) running on one EC2 instance * Front-end served up from S3 buckets * Docker makes this possible * Services don’t require much memory (Python) * All containers built on-demand (weird)
  26. “Make it easy” ❖ Accessible: just need a URL ❖ Defaults: specify only changes from production ❖ Independent: no permission needed ❖ Everyone can create an environment, not just devs ❖ Product owners, designers, curriculum team * It wasn’t pie-in-the-sky optimism that led us to create tools so that everyone could create an environment.
 We didn’t want to be gatekeepers.
 Plus we have always had very active Product Owners
  27. Primary effects ❖ Concurrent features can move forward independently ❖ Anyone can scratch an itch ❖ “Works on my machine” never heard again * Concurrent independent features: huge
  28. Secondary effects ❖ Trust: powerful tools used by everyone, good people recognize the power and use it wisely ❖ Focus: (mostly) abandon efforts for local full stack ❖ Public: accessible via URL = perfect for demos ❖ Test: vector for integration tests to run automatically ❖ Everybody tests, not “somebody else’s job” ❖ …everybody* more familiar with the product trust: giving people agency engenders trust test: re-use
  29. So how’s it going now? ❖ Rest of the company probably tired of hearing about it ❖ Replicating system to larger platform, Java services ❖ Hugely beneficial to build services with this in mind ❖ Current system still results in weekly conflicts ❖ Original: A ❖ Current: Incomplete
  30. Whoa how did that get on there?! Take a drink
  31. 2. CI/CD ❖ Huge level up for a team or org ❖ Shorter feedback loops dev => prod ❖ …also between thought and service * Continuous Integration: get your code running in as real an environment as possible, all the time * Continuous Deployment: deploy changes as they occur and are verified, don’t wait for arbitrary time boundaries * “thought and service” — making it easy to create a new thing and have it act like everything else, the special bits should be in the code and the domain of its change, not in how it’s built and run
  32. Starting state ❖ Multiple systems with ops ownership only ❖ New projects: copy-and-adapt ❖ Little-to-no agency, second-class citizen ❖ No artifact re-use despite Docker ❖ Runtime/deployment config in separate repo ❖ Few (if any) docs per repo ❖ Manual deployment ceremonies at sprint end * Multiple: 2-3x Jenkins, Circle CI 1.x and 2.x, Codeship * Copy Circle/Jenkins files from closest project at hand, no re-use, kubectl commands everywhere * Second-class: weird combination of recognizing the value while being mad when things don’t work — how to recognize? “Someone should take care of this” * No artifact re-use: tests run via maven/gradle commands on worker node, not in container. And repos generally had a dockerfile generation script rather than an actual dockerfile * Separate repo: secrets in one, deployment config in another, each controlled by separate team and each with its own processing logic and know-how — for example, rewriting commonly used keys for database config into what spring expects, but not really documented everywhere (“why document when everyone just copies files from another project?”)
  33. Secondary effects ❖ Spread-out ownership of deployment adds FUD ❖ Fewer deployments, more change in each, more risk ❖ Deployment know-how not everywhere… ❖ Silo: barrier to cross-team coding collaboration ❖ …which IMO leads both product and code collaboration through product management process, inappropriate * Fewer deployments: sensing a theme?
 * Deployment know-how — better than about a year ago, when a guy (A GUY) was doing deploys. In fact one of the first things that happened when I took over this team was his leaving. And I tried really hard to keep him out of fear, and everybody telling me he had all this knowledge that nobody else did. But it all turned out ok, adaptation wasn’t as hard as some people thought. * Silo: one of the benefits of smaller services is that people can spin up on them quickly without asking the author; but scattered docs and deployment confusion make this harder.
 We’re making it hard to do a right thing.
 * Barrier: how can someone get started with a service? make changes themselves and have confidence their changes won’t break anything?
  34. Change goals ❖ Overall goal: Anyone can checkout a repo, run it, make pull requests against it — without asking permission ❖ Prereqs: Docker, Docker Compose, jet, Quay account ❖ Documentation ❖ Follow through ❖ Ownership (?) * Goal: Flatten barrier to entry, allow people to focus on the actual hard stuff
 * Quay account: not wired to SSO :-( — only available in self-hosted version, ugh
 * Documentation: Started in Github Enterprise (yay markdown) but will probably move to Confluence * Tangent: This is a separate tooling issue that’s really hard to solve. Everyone has different needs and nobody will be happy. Keep dev docs in GHE? You’re artificially limiting their scope. Keep all docs in Confluence? Until recently required VPN access (now on SaaS) and all standard problems of wikis.
 * Follow through: my impressions are that previous build-and-release efforts got put out too early before they were fully baked, and because the org didn’t treat it as a first-class thing the authors went back to their “real” jobs before the task was complete, so users were stuck with tools that *mostly* worked.
 * Long-term ownership of tools is another really hard problem, my opinion is that things that are valued get resources. So if the org values it you’ll get folks dedicated to it at least PT.
  35. Change mechanics: systems ❖ Create new (another!) Jenkins instance, this one running on Kubernetes ❖ New jobs are pods, pod autoscaling FTW ❖ Change deployment from running kubectl/helm commands to generating messages ❖ App generates deployment message to queue ❖ Service reads messages from queue and processes
  36. Change mechanics: repos ❖ Move Kubernetes charts into repo, with global config and per-environment overrides ❖ Move secrets into repo using git-crypt ❖ Create Docker-centric workflow using Codeship tools, all files in repo ❖ Add Jenkinsfile (declarative pipeline) to repo ❖ Set examples for documentation in-repo * Kubernetes charts: Helm charts describing resource use, environment variables, pull secrets, etc * git-crypt: store files as encrypted in git, be able to decrypt if you have the key * Jenkins has the key as a credential * Security folks like because you can limit the people who need the key * Codeship tools boiled down into jet, a Go binary that uses a docker-compose-like workflow to declare and run containers * Allows you to spin up dependencies (postgres, redis, cassandra, mail servers, etc) * Can run locally JUST LIKE Jenkins (yay docker) * Jenkins instance spins up new pods for jobs, does not rely on installing the “correct” versions of supporting software (Java, Python, node, Selenium, etc.) on worker nodes
  37. Building it out ❖ “Docker-centric” workflow is crucial ❖ Represent common tools as containers, SUPER RE-USE ❖ Small tools encourage sharing and contribution
 ❖ Automatic version bumping ❖ Deployment message generation ❖ Library publishing to private repo (Java, Python)
  38. Find your early adopters ❖ Familiar with what you’re doing ❖ Willing to go through some pain ❖ Talk and listen, talk and listen ❖ Source of ideas ❖ KEEP THEM HAPPY ❖ You need help to change culture * Working with Marco on all these tools, very willing to experiment * …but he’s also got to do his job: MAKE IT EASY TO DO A RIGHT THING * just because you’re super responsive to suggestions (AND YOU SHOULD BE) doesn’t mean you have to be a pushover
 * Talk and listen: importance of Out of Box Experience emphasized in visit to other office with some early adopters “It would be great if you had a set of files you could get started with”, next day we had a bootstrapping project that templated out the most common things for different languages. HUGE HIT.
  39. Progress ❖ Start slow and work out the kinks ❖ First impressions are important, that README needs to be awesome. ❖ …just like open source ❖ More and more teams coming onboard ❖ My heart swells when I get a question from someone who I didn’t even know was working on it
  40. Deployment feedback ❖ Deployment is No Big Deal ❖ Teams are definitely deploying more ❖ Exposing weirdness in testing (in-memory DBs 😡) ❖ Easier to create new services… less chunky services (?) ❖ More in control… more invested in process (?)
  41. What even is a “sprint”? ❖ If we can deploy anytime we want? ❖ …why do we need sprints anymore?
  42. So how’s it going now? ❖ B+, maybe an A- on good days ❖ Still have bits here and there we need to consider ❖ Some pieces feel too opaque (more logging?) ❖ It’s still early but the Docker mindset hasn’t sunk in yet ❖ Resistance diminished by the carrots of automation
 (deployment, versioning) ❖ Got others to tackle branch naming
  43. Learnings ❖ These are doable things ❖ Put yourself in their shoes ❖ Make it easy to do a right thing ❖ Nobody cares about your tool… until it fails ❖ Big systems are hard to maintain and hand-off ❖ You should always look to put yourself out of a job
  44. Other tools and levers ❖ DRY and effects of taking it to limit ❖ Default all docs/repos to viewable/writeable by everyone within the company ❖ SSO ❖ Rotational work alongside support ❖ Easy standard code formatting (gofmt #) ❖ 100% reliable renaming * These are all things you DO have day-to-day influence on (maybe with the exception of documentation defaults) * The effects of these might not be immediate, but they will ripple
  45. “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world. ” Be The Lever
  46. Thank you!
 Chris Winters - @cwinters
 More resources (tags: collaboration, culture, management)
  48. People to follow ❖ Camille Fournier (@skamille) ❖ Will Larson (@lethain) ❖ Charity Majors (@mipsytipsy) ❖ Randy Shoup (@randyshoup) ❖ Bridget Kromhout (@bridgetkromhout) ❖ Andrew Clay Shafer (@littleidea)
  49. More to read ❖ Soft Stuff (from another person speaking now) ❖ Demystifying Conway’s Law ❖ Release It! (2nd ed)