Chaos engineering

카오스
엔지니어링
Chaos
Engineering

우선 카오스 엔지니어링이란?

( 현실의 각종 장애를 견딜 수 있게 하는 서비스 또는 시스템을 만들어서 )
Production환경의 시스템에 대한
신뢰성을 확보하기 위한 테스트를 하는
방법론 또는 분야입니다.

https://techcrunch.com/2018/02/04/the-rise-of-chaos-engineering/
어떻게 해야 신뢰할 수 있는 소프트웨어를
개발/운영 할 수 있을까? (프로덕션환경에서)

https://techcrunch.com/2018/02/04/the-rise-of-chaos-engineering/
획기적으로 서비스의 품질과 신뢰성을
향상시킬수 있다???

의문1. 카오스 엔지니어링은 어떻게
서비스의 품질과 신뢰성을 획기적으로
향상시킬 수 있다는 것일까?

의문 2. 시스템의 안정성과 신뢰성을 확보하
기 위한 유닛 테스트나 통합테스트와
같은 방법들과는 어떻게 다른것인가?

Principles Of Chaos Engineering
- Build a Hypothesis around Steady State Behavior
- Vary Real-world Events
- Run Experiments in Production
- Automate Experiments to Run Continuously
- Minimize Blast Radius
http://principlesofchaos.org/

CHAOS IN PRACTICE
1. Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
2. Hypothesize that this steady state will continue in both the control group and the experimental group.
3. Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network
connections that are severed, etc.
4. Try to disprove the hypothesis by looking for a difference in steady state between the control group and the
experimental group.
Chaos Test은 분산 시스템의 불확실성을 구체적으로 해결하기
위해 시스템의 약점을 밝히기 위한 실험을 손쉽게 해주는 방법
론입니다. 이러한 실험은 네 단계로 진행됩니다.

CHAOS IN PRACTICE
1. Start by defining ‘steady state’ as some measurable output
of a system that indicates normal behavior.
1. 시스템의 정상적인 상태를 정의하기
- 시스템의 측정가능한 값을 이용해 정상적인 동작을 정의한다.
ex1) CPU load, memory utilization, network I/O 등

CHAOS IN PRACTICE
2. Hypothesize that this steady state will continue in both the
control group and the experimental group.
2. 어떠한 일이 있어도 시험 그룹과 통제 그룹 시스템의
정상적인 상태가 지속될 것이라고 가설을 세운다.
ex) 서버40대로 운영중인 서비스에서 10대가 동시에 고장이 났
다.
그래도 고객은 (주요)서비스가 이용이 가능하다. http://principlesofchaos.org/

CHAOS IN PRACTICE
3. Introduce variables that reflect real world events like
servers that crash, hard drives that malfunction, network
connections that are severed, etc.
3. 현실에서 일어나는 변수(문제)들을 반영해서 실험 그룹에 도입
한다.
- 디비 서버 고장
- DDoS 공격

CHAOS IN PRACTICE
4. Try to disprove the hypothesis by looking for a difference
in steady state between the control group and the
experimental group.
4. 실험그룹과 통제그룹을 비교해서 가설을 검증한다.
- 시스템의 정상적인 상태를 방해하는 것이 어려워질수록
시스템의 신뢰성은 높아진다.
- 취약점이 발견되었을 경우에는 개선한다.

CHAOS IN PRACTICE
1.시스템의 정상적인 상태를 정의하기
2.어떠한 일이 있어도 시험 그룹과 통제 그룹 시스템의
정상적인 상태가 지속될 것이라고 가설을 세운다.
3.현실에서 일어나는 변수(문제)들을 반영해서 실험 그룹에 도입
한다.
4.실험그룹과 통제그룹을 비교해서 가설을 검증한다.

현실에서 일어나는 문제점까지
반영해서 테스트와 개선을 지속한다.

현실에서 일어나는 문제점까지
반영해서 테스트와 개선을 지속한다.
Chaos Monkey - AWS 인스턴스를 랜덤하게 종료 시킴.
Chaos Kong - AWS Zone을 랜덤하게 종료 시킴.
Chaos Gorilla - AWS 리전을 랜덤하게 종료 시킴.

Reference
・http://principlesofchaos.org/
・https://arxiv.org/ftp/arxiv/papers/1702/1702.05843.pdf

See Also
Awesome Chaos Engineering
https://github.com/dastergon/awesome-chaos-engineering

Chaos engineering

Recommended

Recommended

More Related Content

Similar to Chaos engineering

Similar to Chaos engineering (20)

Chaos engineering

Editor's Notes