SlideShare a Scribd company logo
1 of 16
1
Andrew Vagin <avagin@parallels.com>
Developer, Linux Kernel team
OpenVZ and Linux Kernel Testing
2
Agenda
●
Linux containers and OpenVZ
●
Ideal test lab
●
Testing techniques
●
Performance testing
●
Anecdotes
3
Andrew Morton
I'm curious. For the past few months, people@openvz.org have
discovered (and fixed) an ongoing stream of obscure but serious and
quite long-standing bugs.
How are you discovering these bugs?
Andrew added later:
hm, OK, I was visualizing some mysterious Russian bugfinding
machine or something.
Don't stop ;)
David Miller
This issue has existed since the very creation of the netlink code :-)
4
Linux Containers (LXC)
Many isolated environments on top of a single kernel
●
Namespaces
●
Resource accounting
●
Better resource accounting
●
Checkpointing and live migration
●
Extra features: cpu limits, NFS inside CTs, etc
OpenVZ Containers
5
What makes a good test lab?
●
Fully automated system with deployment service
●
A web interface for test scheduling
●
Standard test sets (“combo #3, make it large”)
●
A web interface for test results (comparisons, graphs,
logs)
●
Integration with a bug tracking system
●
Net or serial console to collect kernel oopses
●
KVM, power switch, other goodies
6
How do we find bugs in the mainstream kernel
Containers help us find more bugs
●
Independent life cycles
●
Precise resource accounting
Containers allow us to
●
Test initialization/finalization of kernel subsystems
●
Test error paths
●
Catch more leaks than the regular testing does
●
Catch more race conditions by means of stress testing
7
Start/stop test
●
Massive parallel start/stop and suspend/resume
●
Random resource parameters
Helps to catch:
●
Race conditions
●
Test error paths
●
Memory leaks
8
What makes a good performance test?
●
Effective load:
●
Atomic (UnixBench)
●
Complex (LAMP, SPEC-JBB, vConsolidate)
●
Sane test environment (no random cron jobs etc.)
●
Automation (minimize human interaction)
●
Reproducible results, minimize variability
●
Understand test results, even good ones
12
Density testing
●
High density is important feature of OpenVZ (vs VMs)
●
Test measures response time on a number of CTs
●
increasing the number of CTs until time is bad
●
It's not a stress test
●
Produce a big resource overcommit
13
Other useful tests
●
Week load test replays real httpd logs in real containers
●
Feature tests: isolation, CPU scheduler, checkpointing,
network virtualization, second level quota, etc.
●
Third-party tests: LTP, Сonnectathon, vSpecJBB,
vConsolidate, UNIX bench, sysbench, DVD-store, Netperf
14
Real life stories
15
(1) How a Russian bug finding machine works
●
QA found a leak of 78 bytes of kernel memory
●
Developer was unable to reproduce a bug
●
He found that this is a leak of a 'struct user' object
●
He audited kernel code which references this object
●
Found one suspicious place
●
Wrote a demo code to trigger the bug, and a fix
●
...
●
PROFIT!
16
(2) How resource controls prevented a DoS attack
uid / resource held maxheld barrier limit failcnt
numothersocks 9 360 360 360 1
uid / resource held maxheld barrier limit failcnt
kmemsize 1237973 14372344 14372700 14790164 80
numothersocks 9 360 360 360 1
A simple kernel attack using socketpair()
a.k.a. CVE 2010-4249
18
(3) How a guy measured netns performance
●
It was a nice sunny day...
●
5 different configurations to test
●
Unpredictable, random results
●
CPU throttling caused by overheating;
adding a case fan helped!
20
Conclusion
● Containers are good for kernel testing
● Resource limits (cgroups) are also helpful
● [most] performance tests are hoax
21
Andrew Vagin <avagin@parallels.com>
Thank you.
Questions?

More Related Content

What's hot

Moscow virtualization meetup 2014: CRIU 1.0 What is next?
Moscow virtualization meetup 2014: CRIU 1.0 What is next?Moscow virtualization meetup 2014: CRIU 1.0 What is next?
Moscow virtualization meetup 2014: CRIU 1.0 What is next?
Andrey Vagin
 

What's hot (18)

China.z / Trojan.XorDDOS - Analysis of a hack
China.z / Trojan.XorDDOS - Analysis of a hackChina.z / Trojan.XorDDOS - Analysis of a hack
China.z / Trojan.XorDDOS - Analysis of a hack
 
Kernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel reportKernel Recipes 2016 - The kernel report
Kernel Recipes 2016 - The kernel report
 
Barcamp presentation
Barcamp presentationBarcamp presentation
Barcamp presentation
 
Moscow virtualization meetup 2014: CRIU 1.0 What is next?
Moscow virtualization meetup 2014: CRIU 1.0 What is next?Moscow virtualization meetup 2014: CRIU 1.0 What is next?
Moscow virtualization meetup 2014: CRIU 1.0 What is next?
 
Blocks, procs && lambdas
Blocks, procs && lambdasBlocks, procs && lambdas
Blocks, procs && lambdas
 
Kernel Recipes 2016 - Patches carved into stone tablets...
Kernel Recipes 2016 - Patches carved into stone tablets...Kernel Recipes 2016 - Patches carved into stone tablets...
Kernel Recipes 2016 - Patches carved into stone tablets...
 
Write an Android library
Write an Android libraryWrite an Android library
Write an Android library
 
Testing Rest with Spring by Kostiantyn Baranov (Senior Software Engineer, Gl...
Testing Rest with Spring  by Kostiantyn Baranov (Senior Software Engineer, Gl...Testing Rest with Spring  by Kostiantyn Baranov (Senior Software Engineer, Gl...
Testing Rest with Spring by Kostiantyn Baranov (Senior Software Engineer, Gl...
 
Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!
 
An introduction to Node.js application development
An introduction to Node.js application developmentAn introduction to Node.js application development
An introduction to Node.js application development
 
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckDeliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
 
Easy access to open stack object storage
Easy access to open stack object storageEasy access to open stack object storage
Easy access to open stack object storage
 
Git and Testing
Git and TestingGit and Testing
Git and Testing
 
Ctf cli
Ctf cliCtf cli
Ctf cli
 
Cfgmgmt Challenges aren't technical anymore
Cfgmgmt Challenges aren't technical anymoreCfgmgmt Challenges aren't technical anymore
Cfgmgmt Challenges aren't technical anymore
 
Performance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability LawPerformance Testing in Production - Leveraging the Universal Scalability Law
Performance Testing in Production - Leveraging the Universal Scalability Law
 
Vpm
VpmVpm
Vpm
 
Enjoy fighting regressions_with_git_bisect
Enjoy fighting regressions_with_git_bisectEnjoy fighting regressions_with_git_bisect
Enjoy fighting regressions_with_git_bisect
 

Viewers also liked

Checkpoint and Restore In Userspace: Готово или нет?
Checkpoint and Restore In Userspace: Готово или нет?Checkpoint and Restore In Userspace: Готово или нет?
Checkpoint and Restore In Userspace: Готово или нет?
OpenVZ
 
Seven problems of Linux containers
Seven problems of Linux containersSeven problems of Linux containers
Seven problems of Linux containers
OpenVZ
 
An overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technologyAn overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technology
OpenVZ
 

Viewers also liked (15)

Checkpoint and Restore In Userspace: Готово или нет?
Checkpoint and Restore In Userspace: Готово или нет?Checkpoint and Restore In Userspace: Готово или нет?
Checkpoint and Restore In Userspace: Готово или нет?
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
 
Teach your dockers to use CRanes
Teach your dockers to use CRanesTeach your dockers to use CRanes
Teach your dockers to use CRanes
 
Seven problems of Linux containers
Seven problems of Linux containersSeven problems of Linux containers
Seven problems of Linux containers
 
What's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey BronnikovWhat's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey Bronnikov
 
Мифы и легенды о проекте OpenVZ
Мифы и легенды о проекте OpenVZМифы и легенды о проекте OpenVZ
Мифы и легенды о проекте OpenVZ
 
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
 
An overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technologyAn overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technology
 
Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ
 
FOSDEM 2015: Live migration for containers is around the corner
FOSDEM 2015: Live migration for containers is around the cornerFOSDEM 2015: Live migration for containers is around the corner
FOSDEM 2015: Live migration for containers is around the corner
 
Are containers that we have now secure enough?
Are containers that we have now secure enough?Are containers that we have now secure enough?
Are containers that we have now secure enough?
 
CRIU (Checkpoint and Restore In Userspace) FOSDEM 2015
CRIU (Checkpoint and Restore In Userspace) FOSDEM 2015CRIU (Checkpoint and Restore In Userspace) FOSDEM 2015
CRIU (Checkpoint and Restore In Userspace) FOSDEM 2015
 
PFcache - LinuxCon 2015
PFcache - LinuxCon 2015PFcache - LinuxCon 2015
PFcache - LinuxCon 2015
 
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан КупреевCRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
 
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий МонаховПроблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
 

Similar to LinuxCon 2011: OpenVZ and Linux Kernel Testing

BSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdf
BSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdfBSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdf
BSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdf
MattMuir5
 

Similar to LinuxCon 2011: OpenVZ and Linux Kernel Testing (20)

Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
The State of the Veil Framework
The State of the Veil FrameworkThe State of the Veil Framework
The State of the Veil Framework
 
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing
 
The Future of Security and Productivity in Our Newly Remote World
The Future of Security and Productivity in Our Newly Remote WorldThe Future of Security and Productivity in Our Newly Remote World
The Future of Security and Productivity in Our Newly Remote World
 
BSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdf
BSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdfBSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdf
BSides London 2022 - Introducing varc_ Volatile Artifact Collector (2).pdf
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
DevOps in realtime
DevOps in realtimeDevOps in realtime
DevOps in realtime
 
OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
 
Масштабируемый и эффективный фаззинг Google Chrome
Масштабируемый и эффективный фаззинг Google ChromeМасштабируемый и эффективный фаззинг Google Chrome
Масштабируемый и эффективный фаззинг Google Chrome
 
Leveraging chaos mesh in Astra Serverless testing
Leveraging chaos mesh in Astra Serverless testingLeveraging chaos mesh in Astra Serverless testing
Leveraging chaos mesh in Astra Serverless testing
 
Unit testing (eng)
Unit testing (eng)Unit testing (eng)
Unit testing (eng)
 
Unmanned Aerial Vehicles: Exploit Automation with the Metasploit Framework
Unmanned Aerial Vehicles: Exploit Automation with the Metasploit FrameworkUnmanned Aerial Vehicles: Exploit Automation with the Metasploit Framework
Unmanned Aerial Vehicles: Exploit Automation with the Metasploit Framework
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Surge2012
Surge2012Surge2012
Surge2012
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Cloud Native Java Development Patterns
Cloud Native Java Development PatternsCloud Native Java Development Patterns
Cloud Native Java Development Patterns
 
Practical RISC-V Random Test Generation using Constraint Programming
Practical RISC-V Random Test Generation using Constraint ProgrammingPractical RISC-V Random Test Generation using Constraint Programming
Practical RISC-V Random Test Generation using Constraint Programming
 

More from OpenVZ

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
OpenVZ
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
OpenVZ
 
Containers in a file
Containers in a fileContainers in a file
Containers in a file
OpenVZ
 
Optimizing FUSE for Cloud Storage
Optimizing FUSE for Cloud StorageOptimizing FUSE for Cloud Storage
Optimizing FUSE for Cloud Storage
OpenVZ
 
An overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technologyAn overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technology
OpenVZ
 
Resource management: beancounters
Resource management: beancountersResource management: beancounters
Resource management: beancounters
OpenVZ
 

More from OpenVZ (20)

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
 
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
 
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел ЕмельяновЖивая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
 
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел ТихомировРазвёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
 
LibCT и контейнеры на уровне приложений -- Александр Бурлука
	LibCT и контейнеры на уровне приложений -- Александр Бурлука	LibCT и контейнеры на уровне приложений -- Александр Бурлука
LibCT и контейнеры на уровне приложений -- Александр Бурлука
 
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир ДавыдовУправление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
 
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел ЕмельяновЖивая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
 
LibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey VaginLibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey Vagin
 
Denser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel EmelyanovDenser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel Emelyanov
 
CGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel EmelyanovCGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel Emelyanov
 
Not so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir Kolyshkin
 
Openvz booth
Openvz boothOpenvz booth
Openvz booth
 
Containers in a file
Containers in a fileContainers in a file
Containers in a file
 
Optimizing FUSE for Cloud Storage
Optimizing FUSE for Cloud StorageOptimizing FUSE for Cloud Storage
Optimizing FUSE for Cloud Storage
 
An overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technologyAn overview of OpenVZ virtualization technology
An overview of OpenVZ virtualization technology
 
Resource management: beancounters
Resource management: beancountersResource management: beancounters
Resource management: beancounters
 
Linux Virtualization
Linux VirtualizationLinux Virtualization
Linux Virtualization
 
N problems of Linux containers
N problems of Linux containersN problems of Linux containers
N problems of Linux containers
 

Recently uploaded

Recently uploaded (20)

WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

LinuxCon 2011: OpenVZ and Linux Kernel Testing

  • 1. 1 Andrew Vagin <avagin@parallels.com> Developer, Linux Kernel team OpenVZ and Linux Kernel Testing
  • 2. 2 Agenda ● Linux containers and OpenVZ ● Ideal test lab ● Testing techniques ● Performance testing ● Anecdotes
  • 3. 3 Andrew Morton I'm curious. For the past few months, people@openvz.org have discovered (and fixed) an ongoing stream of obscure but serious and quite long-standing bugs. How are you discovering these bugs? Andrew added later: hm, OK, I was visualizing some mysterious Russian bugfinding machine or something. Don't stop ;) David Miller This issue has existed since the very creation of the netlink code :-)
  • 4. 4 Linux Containers (LXC) Many isolated environments on top of a single kernel ● Namespaces ● Resource accounting ● Better resource accounting ● Checkpointing and live migration ● Extra features: cpu limits, NFS inside CTs, etc OpenVZ Containers
  • 5. 5 What makes a good test lab? ● Fully automated system with deployment service ● A web interface for test scheduling ● Standard test sets (“combo #3, make it large”) ● A web interface for test results (comparisons, graphs, logs) ● Integration with a bug tracking system ● Net or serial console to collect kernel oopses ● KVM, power switch, other goodies
  • 6. 6 How do we find bugs in the mainstream kernel Containers help us find more bugs ● Independent life cycles ● Precise resource accounting Containers allow us to ● Test initialization/finalization of kernel subsystems ● Test error paths ● Catch more leaks than the regular testing does ● Catch more race conditions by means of stress testing
  • 7. 7 Start/stop test ● Massive parallel start/stop and suspend/resume ● Random resource parameters Helps to catch: ● Race conditions ● Test error paths ● Memory leaks
  • 8. 8 What makes a good performance test? ● Effective load: ● Atomic (UnixBench) ● Complex (LAMP, SPEC-JBB, vConsolidate) ● Sane test environment (no random cron jobs etc.) ● Automation (minimize human interaction) ● Reproducible results, minimize variability ● Understand test results, even good ones
  • 9. 12 Density testing ● High density is important feature of OpenVZ (vs VMs) ● Test measures response time on a number of CTs ● increasing the number of CTs until time is bad ● It's not a stress test ● Produce a big resource overcommit
  • 10. 13 Other useful tests ● Week load test replays real httpd logs in real containers ● Feature tests: isolation, CPU scheduler, checkpointing, network virtualization, second level quota, etc. ● Third-party tests: LTP, Сonnectathon, vSpecJBB, vConsolidate, UNIX bench, sysbench, DVD-store, Netperf
  • 12. 15 (1) How a Russian bug finding machine works ● QA found a leak of 78 bytes of kernel memory ● Developer was unable to reproduce a bug ● He found that this is a leak of a 'struct user' object ● He audited kernel code which references this object ● Found one suspicious place ● Wrote a demo code to trigger the bug, and a fix ● ... ● PROFIT!
  • 13. 16 (2) How resource controls prevented a DoS attack uid / resource held maxheld barrier limit failcnt numothersocks 9 360 360 360 1 uid / resource held maxheld barrier limit failcnt kmemsize 1237973 14372344 14372700 14790164 80 numothersocks 9 360 360 360 1 A simple kernel attack using socketpair() a.k.a. CVE 2010-4249
  • 14. 18 (3) How a guy measured netns performance ● It was a nice sunny day... ● 5 different configurations to test ● Unpredictable, random results ● CPU throttling caused by overheating; adding a case fan helped!
  • 15. 20 Conclusion ● Containers are good for kernel testing ● Resource limits (cgroups) are also helpful ● [most] performance tests are hoax

Editor's Notes

  1. My name is Andrey Vagin. I have been working on OpenVZ for the last 5 years. I started working as a QA engineer, developing and running Linux kernel tests. Then I moved to the Linux kernel team as a developer. This talk tries to summarize the experience of me and my colleagues at Parallels.
  2. I want to tell you how we test OpenVZ Linux kernel. I start by explaining what OpenVZ really is. Next, I share some thoughts about an ideal test lab. Then we&amp;apos;ll see which testing techniques are good for kernel testing, and in particular why OpenVZ is helping us to find more bugs. Also, I&amp;apos;d like to say a few words about performance testing. Finally, a few anecdotal cases of bugs found will be presented.
  3. We regularly find and fix bugs in different subsystems of the Linux kernel. Often these bugs are obscure, long-standing and hard to catch. Sometimes maintainers wonder, how we find those bugs. Right now I want to reveal all of our deep secrets.
  4. But before I start, I want to say a few words about Linux Containers and OpenVZ Containers. A container is an isolated environment. Each container has its own user, network, filesystem and other namespaces that virtualize various kernel subsystems. Plus, there are cgroups for additional resource accounting. All containers are running on top of one single kernel – this is what makes them different from virtual machines. Containers do have some restrictions (like, on a Linux machine we can only have Linux containers), but the technology is more effective, because it doesn&amp;apos;t do things such as emulation of hardware devices, or running multiple kernels. Compared to LXC, OpenVZ Containers have better resource accounting and some extra features such as cpu limits, checkpointing and live migration, NFS and FUSE inside containers and so on.
  5. Based on our experience, these are the requirements for a good test lab. First, a test system is fully automatic. It should include the Deployment Service, the results portal, many different configurations of servers and additional hardware such as kvm, power switches and so on. All this components should be tightly integrated together and work smoothly. They may be controlled via web interface. The test system should have easy way to execute tests and find or compare restuls.
  6. A lot of people are testing the Linux kernel, but for us containers play a special role in the process. A container initializes many kernel subsystems on start and destroys them on stop. On a usual system such operations are only done on boot and shutdown. It is hard to perform these operations many times, plus usually after all deinit operations the system is shutting down. Containers give us a way to perform multiple concurrent init/deinit sequences. It helps to find bugs such as not freeing of some resource. Plus, we have per-container resource accounting, which helps in detecting memory leaks. Also it enables to test various seldom error paths when we set different limits on resources.
  7. Now I want to tell about one of significant tests, it&amp;apos;s called Start-stop test. It starts/stops and suspends/resumes many containers simultaneously and sets random resource limits, just for some more fun. Can you imagine this test may find many bugs? Probably you are not sure, but it does, and finds bugs not only in OpenVZ kernel, but in the mainstream kernel, too. Actually it&amp;apos;s also a stress test, since it generates a heavy load. In additional it executes many initialization and finalization of kernel subsystems. Also, this test forces the kernel to execute error paths due to randomization of resource limits. On each iteration it does some sanity checks. For example, it checks that all resource usage counters are zero after a container is stopped. It catches leaks, race conditions, errors on subsystem finalization and even leaks on error paths caused by race conditions.
  8. Performance Testing is the most difficult part of testing. The results of these tests are published and users look at the numbers when choosing a product. So, test results should be comprehensible and reproducible. A main problem in creating of a performance test is to think up a useful workload. All performance tests may be divided into atomic tests and complex tests. Atomic tests make simple basic operations such as context switching, creating a file or forking a process. The to see a full picture, so they are more interested in complex tests. A complex test simulates some real workload. What should be a good performance test? Ideally the test should be fully automatic to avoid human factors and ensure consistency. A person may forget to do something or may do it in another way next time. If you can&amp;apos;t automate the test, you should at least describe the process in great details. You should avoid side effects such as cron jobs, other extra daemons doing some work from time to time, data base index rebuild, CPU scaling and other such stuff. You can&amp;apos;t be too much careful here. We have a special script which validates a test environment. The script is regularly updated when we find a new thing. The test should run several iterations and calculate statistical errors, to make sure results are reproducible. Often the system requires some time for stabilization and for this purpose you can execute a few warm-up iterations, ignoring their results. Then performing a comparison test, all products should be configured in the same or similar way. For example, when comparing network performance of virtualized systems, we should try to use the same networking setup (say, bridged networking). Finally, all the test results, both good and bad, should be analyzed and explained. Analysts are usually done only for bad results, and good ones are taken for granted. The thing is, in some cases good results mean there&amp;apos;s something wrong with the test itself. If you can&amp;apos;t explain your test results, they are totally useless, except maybe for marketing purposes.
  9. Now let me show some results of our performance measurements. We compared XEN, ESXi, KVM and OpenVZ. I choose a LAMP test, because most of out customers are hosting providers. From the following results you can understand how well such type of workloads run in virtualized environment and how many web servers can you run on a single piece of hardware.
  10. On this slide you can see the number of virtual machines affects performance, measured in the number of serviced requests per second. Here we can see that in case of 20 VMs all the products have very similar performance. In case of 40 VMs performance difference becomes more obvious. In case of 60 VMs we can see that all products except for OpenVZ have worse performance than with 40 VMs. This is because the system is too small to handle that amounts of VMs. With OpenVZ, containers are more lightweight so you can have greater number of containers than you could have VMs. In other words, OpenVZ density is higher.
  11. Indeed, OpenVZ high container density is an important feature, so we regularly compare it to other products and try to improve. For that, we have a special density test. This test simulates a typical web hosting workload. Each container has an web server, mail server (with Spam Assasin and an Anti-virus) and Parallels Plesk Panel. This test tries to simulate a workload by sending requests to each service with a defined frequency. On each iteration of the test we add some more containers and measure service response time, making sure it is below a certain limit. Test is stopped when response time is bad. Test result is the number of containers for which the response time is still good. As for every other test, if we see a regression, we try to understand why it happened, and from time to time we find interesting things. For example, last time we found out that the directory entry cache shrinker was too aggressive doing its work, slowing down the whole system.
  12. One more good test is a week load test. It is one of few tests which creates a non-synthetic workload, it replays of real users apache logs. We have many our own tests for testing OpenVZ specific features and use foreign test suites for other functionality.
  13. Now I want to tell a real life story of how one of my colleagues, has fixed a bug in the Linux kernel, causing a comment from Andrew Morton about russian bugfinding machine. In the course of OpenVZ kernel testing, our QA (Quality Assurance) team found a leak of 78 bytes of kernel memory. Who cares about 78 bytes, especially on a server with 16 gigabytes of RAM? We do. We checked the beancounters debug information which showed that one struct user object has leaked. He then tried to reproduce that but with no luck. Bugs that can not be reproduced are hard. The only option left was to audit the kernel source code. That involved finding all the places where struct user object is referenced, and checking the code correctness. It took him 4 hours to do the audit, and he found one place where the reference to an object might be lost. The bug was present not ony OpenVZ kernel, but in the mainstream kernel too. In this case, after the problem was found, fixing it was pretty simple. So he wrote a fix and a demo code to trigger the bug, tested the fix and sent it to Linux kernel mailing list. Why is this particular incident so important? It&amp;apos;s OpenVZ resource limiting code which helped to detect the leak in the first place -- as the bug is very hard to trigger and the leak is small enough that it might not be discovered at all. This bug is in fact a security issue. An ordinary user could exploit the bug and eat all the kernel memory, thus bringing the whole system down. Worse scenarios could be possible as well. Incidentally, OpenVZ is protected from this security issue -- because the kmemsize beancounter (which helped to found it) limits kernel memory usage per Container.
  14. . About a year ago a DoS exploit which leads to system unresponsiveness was published. It looks like most kernels are indeed vulnerable. The good news is OpenVZ is not vulnerable. Why? Because of user beancounters. The nature of exploit is to create an unlimited number of sockets, thus rendering the whole system unusable so you need to power-cycle it to bring it back to life. Now, if you run this exploit in an OpenVZ container, you will hit the numothersock beancounter limit pretty soon and the script will exit. I went further and set numothersock limit to &amp;apos;unlimited&amp;apos;, and re-run the exploit. The situation is much worse in that case, the system slows down considerably, but I was still able to login to the physical server using ssh and kill the offending task from the host system using SIGTERM. Now, another beancounter, kmemsize, is working to save the system. Of course, if you set all beancounters to unlimited, exploit will work. So don&amp;apos;t do that, unless your CT is completely trusted. Those limits are there for a reason, you know.
  15. One of OpenVZ team members, Kirill Kolishkin, decided to suspend a container, but forgot to specify one parameter. Vzctl returned an error, that this parameter wasn&amp;apos;t specified. When Kir executes vzctl with correct parameters, it returned the error “No such container”. After small investigation, he found that the config file disappeared. Kir didn&amp;apos;t guess what the problem in a minute, but then he&amp;apos;s understood how it may be reproduced and where the problem in the code. Now look at this code: This code allocates one variable on the stack, then validates a parameter and initialized the variable. While we do not see anything strange, but let&amp;apos;s see what will occur, if the parameter is invalid. Oh, not. The code in the error path uses the uninitialized variable, it removes a file with name from this variable. By some chance, this variable contains the path to the container&amp;apos;s config. Bad luck. GCC doesn&amp;apos;t report any warning in this case.
  16. One hot summer day, my colleague made performance measurements of network namespaces. He got some results, which look like a set of random data. It&amp;apos;s not first measurements and the procedure was well tested. Where is a problem? The day was hot, a brain worked not well and probably not brain only. It required more then one hour, that he noticed a note about CPU throttling due to overheating. The host had not a body fan, after it is set up, the results is stabilized. What is conclusion of this story? Make sure, that the results is reproducible and remember about sideeffects.