SlideShare a Scribd company logo
1 of 53
Download to read offline
DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024
A PAINFUL MÉMOIRE OF DOS AND DON’TS IN 8 CHAPTERS
INFRASTRUCTURE
GROWINGPAINS
PROLOGUE
RIGHTSIZINGANINFRASTRUCTURETEAM
ISAHARDPROBLEM
MAKINGTHERIGHTINFRASTRUCTUREINVESTMENT
ATTHERIGHTMOMENT
ISANIMPOSSIBLEPROBLEM
ABOUTDAVID
networks: @davidpoblador
email: david@poblador.com
ABOUTDAVID
(MANDATORYFOLLOWMESLIDE)
@entredevyops
@2048enlla
@DevTunerHQ
- Thanks to Ignasi Fosch and Javi Arellano I got acquainted with the wonderful world of Linux in 1998
- Thanks to Albert Horta I started up an ISP in the middle of the .com bust. We survived. For 7 years
- I led infrastructure departments in several companies in BCN
- I wanted to learn the ropes of high up management and I became CTO at an online retail company
- I hated^H^H^H^H^H disliked it.
- I emigrated to Sweden
- The idea was to build the infra department at a small streaming company
- Emil Fredriksson promised it would be hands on
- It wasn’t. (exclusively)
- And I loved it. The gig lasted for 12 years, I was knackered
- I came back to the motherland.
- I started advising companies who wanted to scale. (You know… they thought money was free)
- I became a CTO for a VC
fi
rm, Secways
- Until a week ago!
- I am building something new to help engineers and teams to be productive without the usual corporate bullshit:
DevTuner.
ABOUTDAVID
FOREWORD
(GOANDTELLYOURBOSSESABOUTTHIS)
SOMEOFMY
MISTAKES
LEARNINGS
(SO YOU CAN COME UP WITH YOUR VERY OWN)
Because naming your
teams squads will not
make your company the
next Spotify
Because using OKRs will
not make your business
the next Google.
Because having unlimited
vacation will not make
you become the next
Netflix.
In the same way as
creating a DevOps
Engineer job title doesn't
make you live by DevOps.
Cargoculting must not
prevent you making
mistakes.
THEMYTHOFTHEGREENERGRASS
YAY/NAY
DECISIONS, DECISIONS…
MYALERTING
DASHBOARDISALLRED
LET'S DOUBLE THE SIZE OF THE INFRASTRUCTURE TEAM
MYALERTING
DASHBOARDISALLRED
LET'S GO TO THE BOTTOM OF ALERTS AND REMEDIATE SYSTEMIC PROBLEMS
CHAPTER1
OPPORTUNITYCOST
- You have an idea
- You raised some money
- You have not proved the idea
- You spend too much time on Hacker News
OPPORTUNITYCOST
LET’SBUILDASUPER-SCALABLESYSTEM
USINGALLTHEMODERNPRIMITIVES
AUTO-SCALING
CONTINUOUSDEPLOYMENT
REQUESTTRACING
ALLTHECOOLFRAMEWORKS
YOUNAMEIT…
FORYOUR…ZEROREALUSERS
BECAUSE THINGS NEED TO SCALE, RIGHT?
Most B2B ideas can run in
one server. If they could in
the early 2000s, they can do
it today.
Every day you delay your
MVP, you delay every
subsequent iteration.
Have an inventory of every
shortcut you’ve taken.
Otherwise you’ll be
complaining about technical
debt 2 years from now.
Combine traits. Right balance
between daydreaming and
pragmatism. They don’t grow
on trees.
FOCUS,FOCUS,FOCUS
CHAPTER2
TIMESOFCHAOSATASWEDISHSTREAMINGCOMPANY
- In 2011, we had an operations team and a backend infrastructure team
- We had less than 100 employees. Most of tech was in the same building (
fl
oor)
- The backend infra team was in charge of building everything connected with plumbing, service discovery,
B2B comms, logging, messaging, building core systems, optimising everything
- They were 6 people.
- The operations team was in charge of rollouts, on-call, giving laptops to new employees, racking servers,
installing switches, con
fi
guring BGP, signing contracts with new datacenter providers, being woken up
every night, working 16 hours a day/night, policing (scarce) resources. And much more.
- They were 6 people.
- There were around 20 important systems. Each had an ops owner and a dev owner.
- Everyone was super busy.
- There were already a few million monthly active users.
- We were also building new features.
- Hiring as crazy!
- Systemic problems were not solved.
- Communication was broken: Backend infra felt they were interrupted. Operations felt they were unheard.
TIMESOFCHAOSATASWEDISHSTREAMINGCOMPANY
OPSINTROUBLE?
WRITEAPINKNOTETOBACKEND
INFRAANDWAIT
YES, LIKE THIS ONE:
Hire people with the
right mindset, someone
who can show the value
of constant
communication. Start
sharing some pain!
Find the right balance
between gardening and
landscaping.
OBSESSIVESWATTEAMTOTHERESCUE
CHAPTER3
3XGROWTHINONEYEAR(MOSTDIMENSIONS)
- 20 to 60 systems
- 100 to 300 employees
- 3X active users
- From 7 to 20 teams (squads)
- In one year, however, we could only hire on systems engineer
- Teams had multiple bottlenecks
- Releasing something required a titanic e
ff
ort
3XGROWTHINONEYEAR(MOSTDIMENSIONS)
SYNCHRONISATION
PROBLEMS?
PUTONEPERSONOF
EACHTEAMINA6-
SEATMEETINGROOM
YES, LIKE THIS ONE
Ask for help, you are not
the first one suffering
from a given problem.
Make sure you distribute
operational
responsibilities into
teams.
It's not only about
making teams feel the
pain. It's also about
allowing them to fly solo!
OPERATIONSINSQUADS
CHAPTER4
SPLITTINGWORKAMONG100INFRAPEOPLE
- Alright, by now teams feel the pain, but who does "operations"?
- Who owns "the service being up"?
- Who owns "the service being down"?
- Who owns cross-cutting work
fl
ows (provisioning, capacity planning, monitoring)
- Who owns "Architecture"?
- Conventions? Best practices? Consistency?
- Onboarding?
- Procurement?
- Security?
- ...
- (Di
ffi
cult to talk about of this, without looking like an old fashioned gatekeeping sysadmin)
SPLITTINGWORKAMONG100INFRAPEOPLE
YOUKNOWWHAT,ITDOESN'TMATTER...
EACHFEATURETEAMWILLOWNTHEIR
INFRA,ANDWEDON'TCAREABOUT
CONSISTENCY
WE ARE SMART ENGINEERS ANYWAY, AREN'T WE?
Make a list of problems faced
by the average team.
Factor out what's common.
Find a sensible split.
Treat each space as its own
"product".
Each product gets its own
team, PM, backlog, planning,
customer interviews. Yeah, like
a real product.
Make those teams
autonomous!
INFRASTRUCTUREASAPRODUCTORG
CHAPTER5
CAPACITYPLANNING?NOPE...WAITINGFORCONSTRUCTION
- It turns out there is a very thin
line between doing capacity
planning for backend services
and becoming a real estate
planner... when you grow fast.
- Large parts of your attention
and energy goes to a set of
problems far from your
business...
4.CAPACITYPLANNING?
NOPE...WAITINGFORCONSTRUCTION
LET'SRUNSOMECOMPUTEINTHE ☁
ANDSLOWLYPORTEVERYCOMPONENT
AWAYFROMTHEDATACENTER
BECAUSE WE LOVE HYBRID ARCHITECTURES, RIGHT?
If you make a move to
remove distractions, the
end game must not be
more distracting than the
original situation. Bet, or
don't, but don't half-arse it.
When you do a major
infrastructure shift, cost of
opportunity can kill you.
Netflix, Dropbox, Twitter…
all of them know about
this.
MAKEBOLDDECISIONS,LIMITHYBRIDENVIRONMENTS
CHAPTER6
WHATDOWEDOWITHALLTHEINFRAPEOPLE?IDENTITYCRISIS
- The traditional systems owned by each
infrastructure team are not as cool as what's out
there.
- It doesn't make sense to replicate functionality
that is available in the cloud.
- Technical debt prevents a "real" cloud workload.
- "What's my job now here?"
- Teams building user facing features are lagging
behind from a blessed stack.
WHATDOWEDOWITHALLTHEINFRAPEOPLE?IDENTITYCRISIS
WEDON'TNEEDINFRAPEOPLE
THEYWILLNEEDTOFIND
ANOTHERTEAM
BECAUSE SOMEONE IN FINANCE READ
CLOUD PROVIDERS ARE THE MODERN SOFTWARE OUTSOURCING COMPANIES, RIGHT?
There is probably no one in
your org who knows how the
sausage is made as your infra
people do.
Years of technology will
forcefully require heavy
alignment.
There are plenty of higher
level abstractions you have
not paid attention to because
you were too busy stocking
up SSD drives.
It’s the time to start encoding
your conventions in your
infrastructure layer.
INFRAPEOPLEAREBESTINCLASSTECHNOLOGYAMBASSADORS
CHAPTER7
WAIT,DOWEREALLYNEEDTOREINVENTTHEWHEEL?
- As alignment improves, bespoke solutions
make less sense
- Higher order infrastructure problems become
commodity (containers, orchestration,
monitoring, distributed databases)
- Cloud providers integrate lots of those
products "for free" (ha!)
- The cost of building some of those components
in-house are di
ffi
cult to calculate. In a cloud
invoice, everything is much clearer (ha!)
- The higher order primitives become messy, it's
di
ffi
cult to understand how pieces
fi
t together.
- Failure domains are impossible to reason
about.
WAIT,DOWEREALLYNEEDTOREINVENTTHEWHEEL?
WECAN'TMAKESENSEOFTHE
ECOSYSTEMANYMORE
LET'SDOUBLETHESIZEOFTHETEAM
BECAUSE MONEY IS^H^H WAS FREE. AND BECAUSE ONBOARDING IS CHEAP. RIGHT?
Managed services must
honour some
parameters: no data
lock-in, based on
standard formats, etc.
Do not underestimate
the future costs of price
increases, or
architectural revamps.
Have a well represented
group that tracks
architectural decisions.
They must not be
gatekeepers. They own
ensuring the strategy is
spread, understood,
shared and evolved by
everyone.
TWOWAYDOORDECISIONS,ALWAYS
CHAPTER8
OHRIGHT,WE'VELOSTTHELEVERAGE
- We don't own infra.
- We don't run infra.
- We forgot we knew how to build infra.
- When people build infra, they don't dare to say they have built infra.
- Some senior people spend too much time on Hacker News.
- Wait, can we really run this cheaper? — says your newly hired VP from BigCorp, Inc.
- But that’s going to make our hands dirty, won’t it?
OHRIGHT,WE'VELOSTTHELEVERAGE
MULTICLOUD
WILLSOLVE
ALLMY
PROBLEMS
BECAUSE WE HAVE NOT LEARNED ANYTHING
ABOUT THE COSTS OF HYBRID SYSTEMS,
RIGHT?
Decide carefully which
battles you want to pick.
A cheap service used by
a few teams in very
different ways? A bad
choice!
An expensive service
used by many in a
limited amount of ways?
You can save millions.
You can still build infra.
Cloud shines at
commodity services. But
cloud providers fund
those investments with
higher order services.
YOUCANSTILLBUILDINFRA
EPILOGUE
THEREISONERIGHTMOMENT
FOR(ALMOST)EVERYINFRASTRUCTURE
INVESTMENT
(ANDMANYWRONGMOMENTS)
DON’TFEELBADIFYOUGETITWRONG
MOSTSUCCESSFULPRODUCTSWEUSEWOULDN’T
EXISTWITHOUTSUCHPOORLYTIMEDDECISIONS
CLOSINGREMARKS
- We have spent many years optimising for the real-time use cases. We forgot about the
batch compute use case.
- Big chunks of compute are becoming batch (again)
- This will create space for new "cloud providers"
- This will force us to develop new ways to do resource management
- Lots of software and infrastructure powering AI models needs to be rewritten
- There is a surprising amount of technical debt
- It's time we bring to the table a lot of our treasured knowledge about "reproducible"
infrastructure into the new primitives.
- We will have a job, but we must escape the comfort zone.
- We've done it at least twice in 20 years, this will be the third time.
AREWEGETTINGREPLACEDBYAI?
(YOUTHOUGHTIAMTOOOLDTORIDEONTHEBUZZWORD…NOPE!)
DOESEVERYTHINGWEDONEEDTOBEBIG?
THANKYOU
DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024
networks: @davidpoblador
email: david@poblador.com
ONEMORETHING
DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024
networks: @davidpoblador
email: david@poblador.com
IF YOU WANT TO BE AMONG THE FIRST
TO TRY DEVTUNER…
WE HAVE A WAITING LIST

More Related Content

Similar to Infrastructure Prowing Pains by David Poblador i Garcia - DevOpsBCN - March 2024

Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsSteve Pember
 
Open Web Technologies and You - Durham College Student Integration Presentation
Open Web Technologies and You - Durham College Student Integration PresentationOpen Web Technologies and You - Durham College Student Integration Presentation
Open Web Technologies and You - Durham College Student Integration Presentationdarryl_lehmann
 
How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019
How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019
How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019Lukas Hertig
 
Building an enterprise security knowledge graph to fuel better decisions, fas...
Building an enterprise security knowledge graph to fuel better decisions, fas...Building an enterprise security knowledge graph to fuel better decisions, fas...
Building an enterprise security knowledge graph to fuel better decisions, fas...Jon Hawes
 
Enabling the digital business
Enabling the digital businessEnabling the digital business
Enabling the digital businessDaisy Group
 
5 Practices for Better, Cheaper, Faster Service Delivery
5 Practices for Better, Cheaper, Faster Service Delivery5 Practices for Better, Cheaper, Faster Service Delivery
5 Practices for Better, Cheaper, Faster Service DeliveryRob Schoening
 
Cloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your Cloud
Cloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your CloudCloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your Cloud
Cloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your CloudMark Hinkle
 
Travailler dans le présent - Chris Heilmann - Paris Web 2008
Travailler dans le présent - Chris Heilmann - Paris Web 2008Travailler dans le présent - Chris Heilmann - Paris Web 2008
Travailler dans le présent - Chris Heilmann - Paris Web 2008Association Paris-Web
 
Working In The Now - Paris Web
Working In The Now - Paris WebWorking In The Now - Paris Web
Working In The Now - Paris WebChristian Heilmann
 
Big guns for small guys (reloaded)
Big guns for small guys (reloaded)Big guns for small guys (reloaded)
Big guns for small guys (reloaded)Jorge López-Lago
 
IBM’s zEnterprise Really Stretches Its Boundaries — New Windows Are Opened
IBM’s zEnterprise Really Stretches Its Boundaries  — New Windows Are OpenedIBM’s zEnterprise Really Stretches Its Boundaries  — New Windows Are Opened
IBM’s zEnterprise Really Stretches Its Boundaries — New Windows Are OpenedIBM India Smarter Computing
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsVikram Ramesh
 
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...Dana Gardner
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)Julien SIMON
 
Designing digital transformation v.2.7
Designing digital transformation v.2.7Designing digital transformation v.2.7
Designing digital transformation v.2.7Nigel Green
 

Similar to Infrastructure Prowing Pains by David Poblador i Garcia - DevOpsBCN - March 2024 (20)

Reactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and GrailsReactive Microservice Architecture with Groovy and Grails
Reactive Microservice Architecture with Groovy and Grails
 
Open Web Technologies and You - Durham College Student Integration Presentation
Open Web Technologies and You - Durham College Student Integration PresentationOpen Web Technologies and You - Durham College Student Integration Presentation
Open Web Technologies and You - Durham College Student Integration Presentation
 
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
 
SFDC SA Drain
SFDC SA DrainSFDC SA Drain
SFDC SA Drain
 
How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019
How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019
How Hosting Companies Can Survive Hyperscale Cloud And Hyper Competition In 2019
 
Building an enterprise security knowledge graph to fuel better decisions, fas...
Building an enterprise security knowledge graph to fuel better decisions, fas...Building an enterprise security knowledge graph to fuel better decisions, fas...
Building an enterprise security knowledge graph to fuel better decisions, fas...
 
Enabling the digital business
Enabling the digital businessEnabling the digital business
Enabling the digital business
 
5 Practices for Better, Cheaper, Faster Service Delivery
5 Practices for Better, Cheaper, Faster Service Delivery5 Practices for Better, Cheaper, Faster Service Delivery
5 Practices for Better, Cheaper, Faster Service Delivery
 
Cloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your Cloud
Cloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your CloudCloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your Cloud
Cloud Expo Silicon Valley 2013 | Why Lease When You Can Buy Your Cloud
 
Travailler dans le présent - Chris Heilmann - Paris Web 2008
Travailler dans le présent - Chris Heilmann - Paris Web 2008Travailler dans le présent - Chris Heilmann - Paris Web 2008
Travailler dans le présent - Chris Heilmann - Paris Web 2008
 
Working In The Now - Paris Web
Working In The Now - Paris WebWorking In The Now - Paris Web
Working In The Now - Paris Web
 
Big guns for small guys (reloaded)
Big guns for small guys (reloaded)Big guns for small guys (reloaded)
Big guns for small guys (reloaded)
 
Extreme DevOps in Fintech
Extreme DevOps in FintechExtreme DevOps in Fintech
Extreme DevOps in Fintech
 
IBM’s zEnterprise Really Stretches Its Boundaries — New Windows Are Opened
IBM’s zEnterprise Really Stretches Its Boundaries  — New Windows Are OpenedIBM’s zEnterprise Really Stretches Its Boundaries  — New Windows Are Opened
IBM’s zEnterprise Really Stretches Its Boundaries — New Windows Are Opened
 
Distributed cat herding
Distributed cat herdingDistributed cat herding
Distributed cat herding
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOs
 
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
The Long Road of IT Systems Management Enters the Domain of AIOps-Fueled Auto...
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)
 
Designing digital transformation v.2.7
Designing digital transformation v.2.7Designing digital transformation v.2.7
Designing digital transformation v.2.7
 
Practical uses of AI in retail
Practical uses of AI in retailPractical uses of AI in retail
Practical uses of AI in retail
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Infrastructure Prowing Pains by David Poblador i Garcia - DevOpsBCN - March 2024

  • 1. DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024 A PAINFUL MÉMOIRE OF DOS AND DON’TS IN 8 CHAPTERS INFRASTRUCTURE GROWINGPAINS
  • 7. - Thanks to Ignasi Fosch and Javi Arellano I got acquainted with the wonderful world of Linux in 1998 - Thanks to Albert Horta I started up an ISP in the middle of the .com bust. We survived. For 7 years - I led infrastructure departments in several companies in BCN - I wanted to learn the ropes of high up management and I became CTO at an online retail company - I hated^H^H^H^H^H disliked it. - I emigrated to Sweden - The idea was to build the infra department at a small streaming company - Emil Fredriksson promised it would be hands on - It wasn’t. (exclusively) - And I loved it. The gig lasted for 12 years, I was knackered - I came back to the motherland. - I started advising companies who wanted to scale. (You know… they thought money was free) - I became a CTO for a VC fi rm, Secways - Until a week ago! - I am building something new to help engineers and teams to be productive without the usual corporate bullshit: DevTuner. ABOUTDAVID
  • 9. SOMEOFMY MISTAKES LEARNINGS (SO YOU CAN COME UP WITH YOUR VERY OWN)
  • 10. Because naming your teams squads will not make your company the next Spotify Because using OKRs will not make your business the next Google. Because having unlimited vacation will not make you become the next Netflix. In the same way as creating a DevOps Engineer job title doesn't make you live by DevOps. Cargoculting must not prevent you making mistakes. THEMYTHOFTHEGREENERGRASS
  • 12. MYALERTING DASHBOARDISALLRED LET'S DOUBLE THE SIZE OF THE INFRASTRUCTURE TEAM
  • 13. MYALERTING DASHBOARDISALLRED LET'S GO TO THE BOTTOM OF ALERTS AND REMEDIATE SYSTEMIC PROBLEMS
  • 15. - You have an idea - You raised some money - You have not proved the idea - You spend too much time on Hacker News OPPORTUNITYCOST
  • 17. Most B2B ideas can run in one server. If they could in the early 2000s, they can do it today. Every day you delay your MVP, you delay every subsequent iteration. Have an inventory of every shortcut you’ve taken. Otherwise you’ll be complaining about technical debt 2 years from now. Combine traits. Right balance between daydreaming and pragmatism. They don’t grow on trees. FOCUS,FOCUS,FOCUS
  • 19. - In 2011, we had an operations team and a backend infrastructure team - We had less than 100 employees. Most of tech was in the same building ( fl oor) - The backend infra team was in charge of building everything connected with plumbing, service discovery, B2B comms, logging, messaging, building core systems, optimising everything - They were 6 people. - The operations team was in charge of rollouts, on-call, giving laptops to new employees, racking servers, installing switches, con fi guring BGP, signing contracts with new datacenter providers, being woken up every night, working 16 hours a day/night, policing (scarce) resources. And much more. - They were 6 people. - There were around 20 important systems. Each had an ops owner and a dev owner. - Everyone was super busy. - There were already a few million monthly active users. - We were also building new features. - Hiring as crazy! - Systemic problems were not solved. - Communication was broken: Backend infra felt they were interrupted. Operations felt they were unheard. TIMESOFCHAOSATASWEDISHSTREAMINGCOMPANY
  • 21. Hire people with the right mindset, someone who can show the value of constant communication. Start sharing some pain! Find the right balance between gardening and landscaping. OBSESSIVESWATTEAMTOTHERESCUE
  • 23. - 20 to 60 systems - 100 to 300 employees - 3X active users - From 7 to 20 teams (squads) - In one year, however, we could only hire on systems engineer - Teams had multiple bottlenecks - Releasing something required a titanic e ff ort 3XGROWTHINONEYEAR(MOSTDIMENSIONS)
  • 25. Ask for help, you are not the first one suffering from a given problem. Make sure you distribute operational responsibilities into teams. It's not only about making teams feel the pain. It's also about allowing them to fly solo! OPERATIONSINSQUADS
  • 27. - Alright, by now teams feel the pain, but who does "operations"? - Who owns "the service being up"? - Who owns "the service being down"? - Who owns cross-cutting work fl ows (provisioning, capacity planning, monitoring) - Who owns "Architecture"? - Conventions? Best practices? Consistency? - Onboarding? - Procurement? - Security? - ... - (Di ffi cult to talk about of this, without looking like an old fashioned gatekeeping sysadmin) SPLITTINGWORKAMONG100INFRAPEOPLE
  • 29. Make a list of problems faced by the average team. Factor out what's common. Find a sensible split. Treat each space as its own "product". Each product gets its own team, PM, backlog, planning, customer interviews. Yeah, like a real product. Make those teams autonomous! INFRASTRUCTUREASAPRODUCTORG
  • 31. - It turns out there is a very thin line between doing capacity planning for backend services and becoming a real estate planner... when you grow fast. - Large parts of your attention and energy goes to a set of problems far from your business... 4.CAPACITYPLANNING? NOPE...WAITINGFORCONSTRUCTION
  • 33. If you make a move to remove distractions, the end game must not be more distracting than the original situation. Bet, or don't, but don't half-arse it. When you do a major infrastructure shift, cost of opportunity can kill you. Netflix, Dropbox, Twitter… all of them know about this. MAKEBOLDDECISIONS,LIMITHYBRIDENVIRONMENTS
  • 35. - The traditional systems owned by each infrastructure team are not as cool as what's out there. - It doesn't make sense to replicate functionality that is available in the cloud. - Technical debt prevents a "real" cloud workload. - "What's my job now here?" - Teams building user facing features are lagging behind from a blessed stack. WHATDOWEDOWITHALLTHEINFRAPEOPLE?IDENTITYCRISIS
  • 36. WEDON'TNEEDINFRAPEOPLE THEYWILLNEEDTOFIND ANOTHERTEAM BECAUSE SOMEONE IN FINANCE READ CLOUD PROVIDERS ARE THE MODERN SOFTWARE OUTSOURCING COMPANIES, RIGHT?
  • 37. There is probably no one in your org who knows how the sausage is made as your infra people do. Years of technology will forcefully require heavy alignment. There are plenty of higher level abstractions you have not paid attention to because you were too busy stocking up SSD drives. It’s the time to start encoding your conventions in your infrastructure layer. INFRAPEOPLEAREBESTINCLASSTECHNOLOGYAMBASSADORS
  • 39. - As alignment improves, bespoke solutions make less sense - Higher order infrastructure problems become commodity (containers, orchestration, monitoring, distributed databases) - Cloud providers integrate lots of those products "for free" (ha!) - The cost of building some of those components in-house are di ffi cult to calculate. In a cloud invoice, everything is much clearer (ha!) - The higher order primitives become messy, it's di ffi cult to understand how pieces fi t together. - Failure domains are impossible to reason about. WAIT,DOWEREALLYNEEDTOREINVENTTHEWHEEL?
  • 41. Managed services must honour some parameters: no data lock-in, based on standard formats, etc. Do not underestimate the future costs of price increases, or architectural revamps. Have a well represented group that tracks architectural decisions. They must not be gatekeepers. They own ensuring the strategy is spread, understood, shared and evolved by everyone. TWOWAYDOORDECISIONS,ALWAYS
  • 43. - We don't own infra. - We don't run infra. - We forgot we knew how to build infra. - When people build infra, they don't dare to say they have built infra. - Some senior people spend too much time on Hacker News. - Wait, can we really run this cheaper? — says your newly hired VP from BigCorp, Inc. - But that’s going to make our hands dirty, won’t it? OHRIGHT,WE'VELOSTTHELEVERAGE
  • 44. MULTICLOUD WILLSOLVE ALLMY PROBLEMS BECAUSE WE HAVE NOT LEARNED ANYTHING ABOUT THE COSTS OF HYBRID SYSTEMS, RIGHT?
  • 45. Decide carefully which battles you want to pick. A cheap service used by a few teams in very different ways? A bad choice! An expensive service used by many in a limited amount of ways? You can save millions. You can still build infra. Cloud shines at commodity services. But cloud providers fund those investments with higher order services. YOUCANSTILLBUILDINFRA
  • 50. - We have spent many years optimising for the real-time use cases. We forgot about the batch compute use case. - Big chunks of compute are becoming batch (again) - This will create space for new "cloud providers" - This will force us to develop new ways to do resource management - Lots of software and infrastructure powering AI models needs to be rewritten - There is a surprising amount of technical debt - It's time we bring to the table a lot of our treasured knowledge about "reproducible" infrastructure into the new primitives. - We will have a job, but we must escape the comfort zone. - We've done it at least twice in 20 years, this will be the third time. AREWEGETTINGREPLACEDBYAI? (YOUTHOUGHTIAMTOOOLDTORIDEONTHEBUZZWORD…NOPE!)
  • 52. THANKYOU DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024 networks: @davidpoblador email: david@poblador.com
  • 53. ONEMORETHING DAVID POBLADOR I GARCIA. DEVOPS BCN. MARCH 2024 networks: @davidpoblador email: david@poblador.com IF YOU WANT TO BE AMONG THE FIRST TO TRY DEVTUNER… WE HAVE A WAITING LIST