Un ablemanaging “disasters” without loosing your cool
@eleddy
Develadminisystemators
This talk is for the
who have to
constantly deal with
UNKNOWNS
‣ Know thy system
‣ Know thy tools
‣ Know thy neighbors
Three
Commands
Stairway to
Freedom
Prepare
Isolate
Damage Control
Diagnose
Patch
Clean
Fix
Document
Horizon of Intervention
Communicate
Prepare Isolate Control Diagnose Patch Clean Fix Document
Dear Magic Makers -
As some of you may already know, customers are experiencing troubles retrieving
their historical records because our archive server is not responding. I am
investigating the issue now and will send an update in 20 minutes.
Please fence calls in the meanwhile. If someone can please get me a redbull and
some nacho cheese corn nuts in the meanwhile, that would be stellar.
Thanks!
coworkers
Mayday! High Priority
bossman
Prepare for the Worst
‣ Backups
‣ Local Data.fs
‣ Set a time limit
Prepare Isolate Control Diagnose Patch Clean Fix Document
Disable Interference
Disabled all backups and
packing
Opened up port 8080 to
outside network
Moved logs to temporary
disk
Prepare Isolate Control Diagnose Patch Clean Fix Document
Isolation by Elimination
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Isolation by Elimination
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Isolation by Elimination
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Isolation by Elimination
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Isolation by Elimination
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Zopesplosion 3000 Architecture
Apache
Varnish
HAProxy
CDN
APIs
Zope
Zope
Zope
Zope
Zope
Zope
Zope MySQL
MongoDB
SPARQL
WTF
mate
ZEO 1-4
ZEO 5-8
ZEO 9-12
Prepare Control Diagnose Patch Clean Fix DocumentIsolate
Zopesplosion 3000 Architecture
Apache
Varnish
HAProxy
CDN
APIs
Zope
Zope
Zope
Zope
Zope
Zope
Zope MySQL
MongoDB
SPARQL
ZEO 1-4
ZEO 5-8
ZEO 9-12
Prepare Control Diagnose Patch Clean Fix DocumentIsolate
?
Zopesplosion 3000 Architecture
Apache
Varnish
HAProxy
CDN
APIs
Zope
Zope
Zope
Zope
Zope
Zope
Zope MySQL
MongoDB
SPARQL
ZEO 1-4
ZEO 5-8
ZEO 9-12
Prepare Control Diagnose Patch Clean Fix DocumentIsolate
? ?
Zopesplosion 3000 Architecture
Apache
Varnish
HAProxy
CDN
APIs
Zope
Zope
Zope
Zope
Zope
Zope
Zope MySQL
MongoDB
SPARQL
ZEO 1-4
ZEO 5-8
ZEO 9-12
Prepare Control Diagnose Patch Clean Fix DocumentIsolate
?
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
X
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
X
X
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
X
X
X
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
X
X
X
Modified X
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
X
X
X
Modified X
‘
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
X
X
X
Modified X
‘ Modified X
Machine BMachine A
Machine BMachine A
How Zeo Cache Works
Zope
Mem.
Cache
Zeo
I Want X
I Need X
X
X
X
Modified X
‘ Modified X
Zope
Disk
Cache
Zeo
I Want X
X
X
Modified X
‘ RESTART
Inconsistent State!
Zopesplosion 3000 Architecture
Apache
Varnish
HAProxy
CDN
APIs
Zope
Zope
Zope
Zope
Zope
Zope
Zope MySQL
MongoDB
SPARQL
ZEO 1-4
ZEO 5-8
ZEO 9-12
Prepare Control Diagnose Patch Clean Patch DocumentIsolate
Hot
damn!
Take time to make time
‣ Minimize customer angst
‣ Hang out in custom
‣ Acquisition is your friend
‣ Remember request and
response
Prepare Control Diagnose Patch Clean Fix DocumentIsolate
Prepare Control Diagnose Patch Clean Fix DocumentIsolate
Unique or Just Not Obvious?
‣ Zope, zeo, system logs
‣ System stats/monitoring
Prepare Isolate Control Diagnose Patch Clean Fix Document
Test Case
Prepare Isolate Control Diagnose Patch Clean Fix Document
Sarcoidosis!
Probably
not...
Estimate
Fix Time
+
Horizon of Intervention
Prepare Isolate Control Diagnose Patch Clean Fix Document
Can I
handle this
problem?
Can
I do it in
a timely
manner?
Yes
IRC
Plone-users
Yes
NONO
Friends
Colleagues
Front End
Errors
Take the performance hit
Disable the malfunctioning piece
Prepare Isolate Control Diagnose Patch Clean Fix Document
temporary patch
Prepare Isolate Control Diagnose Patch Clean Fix Document
full patch
Have I mentioned the
importance of
Prepare Isolate Control Diagnose Patch Clean Fix Document
BACKUPS
working with
yet?
Especially when unfucking data...
Clean up
Prepare Isolate Control Diagnose Patch Clean Fix Document
Disabled all backups and
packing
Opened up port 8080 to
outside network
Moved logs to temporary
disk
Disabled zopes 5-10
Clean up
Prepare Isolate Control Diagnose Patch Clean Fix Document
Disabled all backups and
packing
Opened up port 8080 to
outside network
Moved logs to temporary
disk
Disabled zopes 5-10
Prepare Isolate Control Diagnose Patch Clean Fix Document
Delete extra/bad files
Scripts in version control
Communicate
Clean up
Prepare Isolate Control Diagnose Patch Clean Fix Document
I’ve got a fever, and
the only solution... is
MORE PATCH!
‣ Update/Close Tickets
‣ Integrate Test Cases
‣ Document Processes
Prepare Isolate Control Diagnose Patch Clean Fix Document
Handling Data Errors
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Handling Data Errors
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Handling Data Errors
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Handling Data Errors
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Handling Data Errors
Prepare Isolate Control Diagnose Patch Clean Fix Document
Network Hardware Software Data
works for
me
obvious,
sporadic
crazy shit
everything
else
not
recreatable
locally
Prepare Isolate Control Diagnose Patch Clean Fix Document
How Data is Stored
Plone
root (app)
NewsMembers Events
acl_users
acl_users
users roles
users roles
news.2010.09.08 news.2010.06.13
Prepare Isolate Control Diagnose Patch Clean Fix Document
temp_folder
The Basics
Prepare Isolate Control Diagnose Patch Clean Fix Document
‣ ./bin/instance
debug
‣ app
‣ dir, __dict__
Direct Connect
>>> from ZODB.FileStorage import FileStorage
>>> from ZODB.DB import DB
>>> storage = FileStorage('var/filestorage/Data.fs')
>>> db = DB(storage)
>>> connection = db.open()
>>> root = connection.root()
Prepare Isolate Control Diagnose Patch Clean Fix Document
>>> from ZEO import ClientStorage
>>> from ZODB import DB
>>> address = '10.0.1.5', 8001
>>> db = DB(storage)
>>> connection = db.open()
>>> root = connection.root()
>>> root[‘app’] = PloneSite()
>>> root[‘status’] = ‘Running’
Prepare Isolate Control Diagnose Patch Clean Fix Document
>>> import transaction
>>> del app.Plone.news[‘news-item-id’]
>>> transaction.commit()
_p_changed
Prepare Isolate Control Diagnose Patch Clean Fix Document
When in doubt...
‣ PDB is your friend
‣ The source is your friend
‣ Throw a party for your friends
Prepare Isolate Control Diagnose Patch Clean Fix Document
‣ Know your System
‣ Understand the Tools
‣ Be Nice to your
Neighbors

Ungooglable