Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Managing by Missing

580 views

Published on

My software engineering management mechanisms and philosophy learned at 8 years in AWS/ 10 years in Amazon.

Published in: Engineering

Managing by Missing

  1. 1. Managing by Missing October 3, 2018 Ian Nowland
  2. 2. 2 • January 2000: Graduated, worked as Software Engineer for 11 years, across 3 companies • January 2011: Switched to being a manager • January 2012: Started a new team from scratch in EC2, over the next 4.5 years, grew team from 1 to 52 • November 2016: Left EC2, burned out. Joined Two Sigma. Developed this material. • March 2019: Joined Datadog, VP of Metrics and Alerts Who am I?
  3. 3. 3 ● Seven areas of management: People, Product, Execution, Partners, Operations, Engineering and the Company ● Getting in front of all to have no negative surprises is impossible ● These negative surprises are “misses”. They happen. ● Growing as a manager is owning misses, and thinking broadly about mechanisms to avoid or mitigate them earlier ● All while delegating more responsibility to your team in a mechanistic manner, to help them through the recursive process Summary
  4. 4. 4 What is a miss? A miss is anytime the organization or anyone in it is negatively impacted as a result of your team’s action or inaction
  5. 5. 5 What are the 7 Areas of Management? The seven areas that need your time and focus: Engineering: How are things being built? People: Are people happy and growing in what is being built? Execution: How are things getting built? Product: Are customers satisfied by what is being built? Operations: Is the built thing going to keep running? Partners: Do all my partners understand and agree with all the above? Company: Does the company align with all these answers?
  6. 6. 6 Engineering How are things being built?
  7. 7. 7 ● Is your team following industry best practices on: ○ Code Quality? ○ Unit and integration testing? ○ Specification and design ○ Getting consensus on specification and design? ○ The amount of tech debt being accumulated or paid down? ● If not, how are you spending time and focus to change the path? Engineering: Broad Strategic Questions
  8. 8. 8 ● January 2012: ○ Start new team (EC2 Nitro) ○ 7 person team reporting to me. Lead reported to my manager. ○ Lead refused to unit test his code; thought it was a bad practice ● November 2013: ○ Release V1 on time ○ Over 50,000 lines of C ○ Lead engineer wrote 80% - 40,000 lines ○ With no unit tests, and good (but shallow) integration tests Engineering: Anatomy of a Miss
  9. 9. 9 ● March 2014: ○ Lead engineer quits to found a startup ○ All my focus was on shipping V2 ○ I give the lead’s old code to strong junior engineer ● July 2014: ○ Two character “/8” bug costs two development months to resolve Engineering: Anatomy of a Miss
  10. 10. 10 I didn’t take the time and ask: Now the lead is leaving, how do I accommodate the tech debt we have accumulated? Engineering: My Miss
  11. 11. 11 The only things you control are your time and your focus. You need to always ask yourself: are you using them in the most optimal manner at this time? Engineering: Broad Lesson
  12. 12. 12 People Are people happy and growing in what is being built?
  13. 13. 13 ● Do your people have purpose; understanding where their work fits in the company mission? ● Do you have the right people to achieve your part of that mission? ● Do you understand and accommodate what motivates them? ● Do you understand and accommodate their growth? ● Have you created an environment of safety where they can be honest, have their own misses, and grow? People: Broad Strategic Questions
  14. 14. 14 ● March 2014: Two months after lead engineer left: ○ My manager also left ○ I took over existing team, who owned software with no clear future ○ I met with the manager every week, he thought team was happy ○ I focussed on executing V2 with my original team ● August 2014: Finally had skip 1:1s with the team I took over ○ And realized half were on the verge of quitting ○ They saw no future for their team and so themselves People: Anatomy of a Miss
  15. 15. 15 My manager missed in being too tactical in his 1:1s But: ○ I missed in not asking deeper questions in my 1:1s ○ I also missed in not having skip 1:1s sooner ○ We both missed in putting too much time and focus on Execution rather than People People: My Misses
  16. 16. 16 When your head is down you are missing what’s up i.e., When you focus on only one area you miss information on the other six People: Broad Lessons
  17. 17. 17 3. Execution How are things getting built?
  18. 18. 18 ● What deadlines does your team have? ○ How real are they? ○ Are you on top of executing to hit them in face of all risk? ○ What buffer or options to shuffle priorities do you have? ○ Do your partners, management and customers understand all this? ● Do you know all your external dependencies? ○ Are they on track? ○ Do they believe your deadlines for them are real? Execution: Broad Strategic Questions
  19. 19. 19 ● November 2013: V1 miracuously shipped on time ● November 2014: deadline for V2 ● August 2014 year to date recap: ○ Manager and lead engineer gone ○ Managing additional team, which I had to convince had a future ○ V1 in production with bug that cost dev-months Execution: Anatomy of a Miss
  20. 20. 20 ● 1st November 2014: (2 weeks out from release date) ○ Reset for December 15th ● 1st December 2014: (2 weeks out from new release date) ○ Reset for January 9th ● 9th January 2015: Launched V2 ○ With nasty data corruption bug ○ Discovered quickly, but two months to fully mitigate Execution: Anatomy of a Miss
  21. 21. 21 When the first slip happened, did not seriously re- evaluate ship date and risks Result was a 6 month death march Execution: My Miss
  22. 22. 22 By being actionable information, misses are opportunities To take advantage of the opportunity you need to own the miss, and reset your strategy in light of them Execution: Broad Lesson
  23. 23. 23 4. Product Are customers satisfied by what is being built?
  24. 24. 24 ● Why does your team exist - what is your vision? ○ “A collection of somewhat related systems” is not a vision ○ Does your team own the right systems to execute that vision? ● What is your strategy to deliver your vision? ○ i.e., which systems are you investing in and why? ● What is your execution plan (i.e. roadmap) for that strategy? ● Do all these people agree with the above: ○ Customers, Partners, Team members, Your Management? ○ Why are you sure? Product: Broad Strategic Questions
  25. 25. 25 ● January 2015: Technology established, new 2015 initiatives come in needing major work from my team: ○ Hypervisor and bare metal functionality (i.e. c5 nitro) ○ Network load balancing (i.e. ALB) ○ Multiple types of storage (i.e. EFS) ○ Low latency NICs (i.e., ENA/EFA) ● All on top of my organizations (VPC) full roadmap ● I met with each 1:1 to come up with a compromise of partial commits ● But each new team set goals assuming 100% commit ● Causing a lot of political infighting, costing me a lot of time and focus Product: Anatomy of a Miss
  26. 26. 26 I didn’t proactively own the roadmap narrative for my team That led partner teams to make mistakes in timeline of their strategies Product: My Miss
  27. 27. 27 A miss is anytime the organization or anyone in it is negatively impacted as a result of your team’s action or inaction ● i.e., It includes misses of communication ● It includes when the other party should have communicated with you ● It includes when you did communicate, but not in a way the other party committed to being accountable Product: Broadening What I Think of as a Miss
  28. 28. 28 You need to own the narrative ● You need to have a strategy ● You need to communicate that strategy ● You need to be seen to deliver on your strategies Product: Broad Lesson
  29. 29. 29 5. Partners Do all my partners understand and agree with all the above?
  30. 30. 30 ● Am I having sideways 1:1s often enough with all managers whose teams are impacted by, or impact my team? ● Do I understand what their goals and challenges are? ● Am I always pushing back on behalf of my team’s happiness and goals, and never considering the other team’s happiness and goals? ● Are my team doing the same, without me knowing? Partners: Broad Strategic Questions
  31. 31. 31 ● December: ○ Take over software engineering team ■ Owns their own networking switches ■ Different vendor to the rest of the network ○ My team strongly pushed that Networking team should take them ○ Run quiet for 2 years, and due to be retired in 14 months ○ Networking team refused to take ownership ● 11 months later: ○ Switches started having mass operational issues ○ I had to ask the Networking team for help, and they did ○ Their engineer engaged for more than a month Partners: Anatomy of an Mitigated Miss
  32. 32. 32 Partners: How I avoided a bigger miss ● Throughout the year, had 1:1s with Networking management ● Also gave three months of my developer time to work on a project that fit my developer’s interest and their need
  33. 33. 33 Sometimes the only way to avoid a miss is a partner sacrificing. It’s better when this is because they want to help you, instead of needing to escalate That comes from building relationships and understanding ahead of time Partners: Broad Lesson
  34. 34. 34 6. Operations Is the built thing going to keep running?
  35. 35. 35 ● Are my team on top of: ○ Monitoring Production? ○ Capacity Planning? ○ Change Management? ○ Operational Customer Communication? ● Why am I sure? ● What mechanisms do I need to stay sure? Operations: Broad Strategic Questions
  36. 36. 36 Operations: Anatomy of a Small Miss ● Beginning of Year: ○ Datacenter team moves to a quarterly ordering model for servers ○ This means ordering a server can take 5 months ○ I communicated this to my managers in a staff meeting ● August: ○ Two of my managers say they need capacity in 3 months
  37. 37. 37 Operations: Avoiding a Bigger Miss False miss: I did not communicate in a way that drove ongoing focus. Real Miss: I had no mechanism to ensure managers were staying on top of this continuous need
  38. 38. 38 Leveraging time and focus is building mechanisms around delegation Operations: Broad Lesson
  39. 39. 39 What is a Mechanism? ● 4 basic things: ○ Identification of stakeholders affected ○ A goal for which success/failure can be ongoingly judged ○ A owner ○ A periodic or edge triggered check in mechanism communicated to all stakeholders ● Example: TPM owns a project with a deadline, defines milestones they own, hosting a status update meeting for all stakeholders after each ● Example: Eng Manager owns a Availability SLA for their service, each month owns reporting misses to stakeholders
  40. 40. 40 7. Company Does the company align with all these answers?
  41. 41. 41 ● Is there something about the way the organization does people, product, process, partners, engineering, operations that is not actually right for the company? ● Is this going to be a major problem? ● Then what do I need to influence the company to change? Company: Broad Strategic Questions
  42. 42. 42 ● AWS was losing Systems Engineers to competitors because of compensation ● A peer of mine decided to take ownership ● After taking up with HR, understands Amazon lumps engineers doing automation at massive scale with operational engineers ● Peer works with HR to create new job family; works with his leadership to get it through CEO approval ● Amazon creates new job family (Systems Development Engineer) with compensation that aligns with competitors Company: Peer Addresses a Miss
  43. 43. 43 Organizations change because passionate people try and change them Company: Broad Lesson
  44. 44. 44 ● Seven areas of management: People, Product, Execution, Partners, Operations, Engineering and the Company ● Getting in front of all to have no negative surprises is impossible ● These negative surprises are “misses”. They happen. ● Growing as a manager is owning misses, and thinking broadly about mechanisms to avoid or mitigate them earlier ● All while delegating more responsibility to your team in a mechanistic manner, to help them through the recursive process Summary

×