Ever thought about taking your infrastructure or platform team from a cosy group to premier league status? Let’s have a relaxed chat about making it big while staying on point. I dive into tales and tactics for beefing up your infrastructure from supporting fewer than 100 folks to powering a crowd of a thousand or more, all while keeping your tech solid and your team atmosphere upbeat. This session is perfect for leaders on the growth path and any tech pro involved in building or running infrastructure who’s aiming higher. Expect a down-to-earth rundown of dos and don’ts plus a handful of “oh no” moments from my journey of upsizing infrastructure at Spotify and beyond
7. - Thanks to Ignasi Fosch and Javi Arellano I got acquainted with the wonderful world of Linux in 1998
- Thanks to Albert Horta I started up an ISP in the middle of the .com bust. We survived. For 7 years
- I led infrastructure departments in several companies in BCN
- I wanted to learn the ropes of high up management and I became CTO at an online retail company
- I hated^H^H^H^H^H disliked it.
- I emigrated to Sweden
- The idea was to build the infra department at a small streaming company
- Emil Fredriksson promised it would be hands on
- It wasn’t. (exclusively)
- And I loved it. The gig lasted for 12 years, I was knackered
- I came back to the motherland.
- I started advising companies who wanted to scale. (You know… they thought money was free)
- I became a CTO for a VC
fi
rm, Secways
- Until a week ago!
- I am building something new to help engineers and teams to be productive without the usual corporate bullshit:
DevTuner.
ABOUTDAVID
10. Because naming your
teams squads will not
make your company the
next Spotify
Because using OKRs will
not make your business
the next Google.
Because having unlimited
vacation will not make
you become the next
Netflix.
In the same way as
creating a DevOps
Engineer job title doesn't
make you live by DevOps.
Cargoculting must not
prevent you making
mistakes.
THEMYTHOFTHEGREENERGRASS
17. Most B2B ideas can run in
one server. If they could in
the early 2000s, they can do
it today.
Every day you delay your
MVP, you delay every
subsequent iteration.
Have an inventory of every
shortcut you’ve taken.
Otherwise you’ll be
complaining about technical
debt 2 years from now.
Combine traits. Right balance
between daydreaming and
pragmatism. They don’t grow
on trees.
FOCUS,FOCUS,FOCUS
19. - In 2011, we had an operations team and a backend infrastructure team
- We had less than 100 employees. Most of tech was in the same building (
fl
oor)
- The backend infra team was in charge of building everything connected with plumbing, service discovery,
B2B comms, logging, messaging, building core systems, optimising everything
- They were 6 people.
- The operations team was in charge of rollouts, on-call, giving laptops to new employees, racking servers,
installing switches, con
fi
guring BGP, signing contracts with new datacenter providers, being woken up
every night, working 16 hours a day/night, policing (scarce) resources. And much more.
- They were 6 people.
- There were around 20 important systems. Each had an ops owner and a dev owner.
- Everyone was super busy.
- There were already a few million monthly active users.
- We were also building new features.
- Hiring as crazy!
- Systemic problems were not solved.
- Communication was broken: Backend infra felt they were interrupted. Operations felt they were unheard.
TIMESOFCHAOSATASWEDISHSTREAMINGCOMPANY
21. Hire people with the
right mindset, someone
who can show the value
of constant
communication. Start
sharing some pain!
Find the right balance
between gardening and
landscaping.
OBSESSIVESWATTEAMTOTHERESCUE
23. - 20 to 60 systems
- 100 to 300 employees
- 3X active users
- From 7 to 20 teams (squads)
- In one year, however, we could only hire on systems engineer
- Teams had multiple bottlenecks
- Releasing something required a titanic e
ff
ort
3XGROWTHINONEYEAR(MOSTDIMENSIONS)
25. Ask for help, you are not
the first one suffering
from a given problem.
Make sure you distribute
operational
responsibilities into
teams.
It's not only about
making teams feel the
pain. It's also about
allowing them to fly solo!
OPERATIONSINSQUADS
27. - Alright, by now teams feel the pain, but who does "operations"?
- Who owns "the service being up"?
- Who owns "the service being down"?
- Who owns cross-cutting work
fl
ows (provisioning, capacity planning, monitoring)
- Who owns "Architecture"?
- Conventions? Best practices? Consistency?
- Onboarding?
- Procurement?
- Security?
- ...
- (Di
ffi
cult to talk about of this, without looking like an old fashioned gatekeeping sysadmin)
SPLITTINGWORKAMONG100INFRAPEOPLE
29. Make a list of problems faced
by the average team.
Factor out what's common.
Find a sensible split.
Treat each space as its own
"product".
Each product gets its own
team, PM, backlog, planning,
customer interviews. Yeah, like
a real product.
Make those teams
autonomous!
INFRASTRUCTUREASAPRODUCTORG
31. - It turns out there is a very thin
line between doing capacity
planning for backend services
and becoming a real estate
planner... when you grow fast.
- Large parts of your attention
and energy goes to a set of
problems far from your
business...
4.CAPACITYPLANNING?
NOPE...WAITINGFORCONSTRUCTION
33. If you make a move to
remove distractions, the
end game must not be
more distracting than the
original situation. Bet, or
don't, but don't half-arse it.
When you do a major
infrastructure shift, cost of
opportunity can kill you.
Netflix, Dropbox, Twitter…
all of them know about
this.
MAKEBOLDDECISIONS,LIMITHYBRIDENVIRONMENTS
35. - The traditional systems owned by each
infrastructure team are not as cool as what's out
there.
- It doesn't make sense to replicate functionality
that is available in the cloud.
- Technical debt prevents a "real" cloud workload.
- "What's my job now here?"
- Teams building user facing features are lagging
behind from a blessed stack.
WHATDOWEDOWITHALLTHEINFRAPEOPLE?IDENTITYCRISIS
37. There is probably no one in
your org who knows how the
sausage is made as your infra
people do.
Years of technology will
forcefully require heavy
alignment.
There are plenty of higher
level abstractions you have
not paid attention to because
you were too busy stocking
up SSD drives.
It’s the time to start encoding
your conventions in your
infrastructure layer.
INFRAPEOPLEAREBESTINCLASSTECHNOLOGYAMBASSADORS
39. - As alignment improves, bespoke solutions
make less sense
- Higher order infrastructure problems become
commodity (containers, orchestration,
monitoring, distributed databases)
- Cloud providers integrate lots of those
products "for free" (ha!)
- The cost of building some of those components
in-house are di
ffi
cult to calculate. In a cloud
invoice, everything is much clearer (ha!)
- The higher order primitives become messy, it's
di
ffi
cult to understand how pieces
fi
t together.
- Failure domains are impossible to reason
about.
WAIT,DOWEREALLYNEEDTOREINVENTTHEWHEEL?
41. Managed services must
honour some
parameters: no data
lock-in, based on
standard formats, etc.
Do not underestimate
the future costs of price
increases, or
architectural revamps.
Have a well represented
group that tracks
architectural decisions.
They must not be
gatekeepers. They own
ensuring the strategy is
spread, understood,
shared and evolved by
everyone.
TWOWAYDOORDECISIONS,ALWAYS
43. - We don't own infra.
- We don't run infra.
- We forgot we knew how to build infra.
- When people build infra, they don't dare to say they have built infra.
- Some senior people spend too much time on Hacker News.
- Wait, can we really run this cheaper? — says your newly hired VP from BigCorp, Inc.
- But that’s going to make our hands dirty, won’t it?
OHRIGHT,WE'VELOSTTHELEVERAGE
45. Decide carefully which
battles you want to pick.
A cheap service used by
a few teams in very
different ways? A bad
choice!
An expensive service
used by many in a
limited amount of ways?
You can save millions.
You can still build infra.
Cloud shines at
commodity services. But
cloud providers fund
those investments with
higher order services.
YOUCANSTILLBUILDINFRA
50. - We have spent many years optimising for the real-time use cases. We forgot about the
batch compute use case.
- Big chunks of compute are becoming batch (again)
- This will create space for new "cloud providers"
- This will force us to develop new ways to do resource management
- Lots of software and infrastructure powering AI models needs to be rewritten
- There is a surprising amount of technical debt
- It's time we bring to the table a lot of our treasured knowledge about "reproducible"
infrastructure into the new primitives.
- We will have a job, but we must escape the comfort zone.
- We've done it at least twice in 20 years, this will be the third time.
AREWEGETTINGREPLACEDBYAI?
(YOUTHOUGHTIAMTOOOLDTORIDEONTHEBUZZWORD…NOPE!)
52. THANKYOU
DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024
networks: @davidpoblador
email: david@poblador.com
53. ONEMORETHING
DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024
networks: @davidpoblador
email: david@poblador.com
IF YOU WANT TO BE AMONG THE FIRST
TO TRY DEVTUNER…
WE HAVE A WAITING LIST