SlideShare a Scribd company logo
Make Sure Your Applications Crash




           Moshe Zadka
True story
Python doesn't crash




Memory managed, no direct pointer arithmetic
...except it does




 C bugs, untrapped exception, infinite loops,
blocking calls, thread dead-lock, inconsistent
                 resident state
Recovery is important




"[S]ystem failure can usually be considered to
  be the result of two program errors[...] the
      second, in the recovery routine[...]"
Crashes and inconsistent data




A crash results in data from an arbitrary
            program state.
Avoid storage




Caches are better than master copies.
Databases




Transactions maintain consistency
    Databases can crash too!
Atomic operations




    File rename
Example: Counting
def update_counter():
    fp = file("counter.txt")
    s = fp.read()
    counter = int(s.strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("counter.txt.tmp", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
    # The following is an atomic operation
    os.rename("counter.txt.tmp", "counter.txt")
Efficient caches, reliable masters




     Mark inconsistency of cache
No shutdown




Crash in testing
Availability




If data is consistent, just restart!
Improving availability




        Limit impact
       Fast detection
        Fast start-up
Vertical splitting




Different execution paths, different processes
Horizontal splitting




Different code bases, different processes
Watchdog




Monitor -> Flag -> Remediate
Watchdog principles




Keep it simple, keep it safe!
Watchdog: Heartbeats
## In a Twisted process
def beat():
    file('beats/my-name', 'a').close()
task.LoopingCall(beat).start(30)
Watchdog: Get time-outs
def getTimeout()
    timeout = dict()
    now = time.time()
    for heart in glob.glob('hearts/*'):
        beat = int(file(heart).read().strip())
        timeout[heart] = now-beat
    return timeout
Watchdog: Mark problems
def markProblems():
    timeout = getTimeout()
    for heart in glob.glob('beats/*'):
        mtime = os.path.getmtime(heart)
        problem = 'problems/'+heart
        if (mtime<timeout[heart] and
           not os.path.isfile(problem)):
            fp = file('problems/'+heart, 'w')
            fp.write('watchdog')
            fp.close()
Watchdog: check solutions
def checkSolutions():
    now = time.time()
    problemTimeout = now-30
    for problem in glob.glob('problems/*'):
        mtime = os.path.getmtime(problem)
        if mtime<problemTimeout:
            subprocess.call(['restart-system'])
Watchdog: Loop
## Watchdog
while True:
    markProblems()
    checkSolutions()
    time.sleep(1)
Watchdog: accuracy of




Custom checkers can manufacture problems
Watchdog: reliability of




   Use cron for main loop
Watchdog: reliability of




Use software/hardware watchdogs
Conclusions




Everything crashes -- plan for it
Questions?
Welcome to the back-up slides
         Extra! Extra!
Example: Counting on Windows
def update_counter():
    fp = file("counter.txt")
    s = fp.read()
    counter = int(s.strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("counter.txt.tmp", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
    os.remove("counter.txt")
    # At this point, the state is inconsistent*
    # The following is an atomic operation
os.rename("counter.txt.tmp", "counter.txt")
Example: Counting on Windows
             (Recovery)
def recover():
    if not os.path.exists("counter.txt"):
        # The permanent file has been removed
        # Therefore, the temp file is valid
        os.rename("counter.txt.tmp",
                  "counter.txt")
Example: Counting with versions
def update_counter():
    files = [int(name.split('.')[-1])
               for name in os.listdir('.')
                 if name.startswith('counter.')]
    last = max(files)
    counter = int(file('counter.%s' % last
                      ).read().strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("tmp.counter", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
os.rename('tmp.counter',
          'counter.%s' % (last+1))
os.remove('counter.%s' % last)
Example: Counting with versions
             (cleanup)
# This is not a recovery routine, but a cleanup
# routine.
# Even in its absence, the state is consistent
def cleanup():
    files = [int(name.split('.')[-1])
                for name in os.listdir('.')
                  if name.startswith('counter.')]
    files.sort()
    files.pop()
    for n in files:
        os.remove('counter.%d' % n)
    if os.path.exists('tmp.counter'):
        os.remove('tmp.counter')
Correct ordering
def activate_due():
    scheduled = rs.smembers('scheduled')
    now = time.time()
    for el in scheduled:
        due = int(rs.get(el+':due'))
        if now<due:
            continue
        rs.sadd('activated', el)
        rs.delete(el+':due')
        rs.sremove('scheduled', el)
Correct ordering (recovery)
def recover():
    inconsistent = rs.sinter('activated',
                             'scheduled')
    for el in inconsistent:
        rs.delete(el+':due') #*
        rs.sremove('scheduled', el)
Example: Key/value stores
0.log:
  ['add', 'key-0', 'value-0']
  ['add', 'key-1', 'value-1']
  ['add', 'key-0', 'value-2']
  ['remove', 'key-1']
  .
  .
  .

1.log:
  .
  .
  .

2.log:
.
.
.
Example: Key/value stores (utility
             functions)
## Get the level of a file
def getLevel(s)
    return int(s.split('.')[0])

## Get all files of a given type
def getType(tp):
    return [(getLevel(s), s)
             for s in files if s.endswith(tp)]
Example: Key/value stores
             (classifying files)
## Get all relevant files
def relevant(d):
    files = os.listdir(d):
    mlevel, master = max(getType('.master'))
    logs = getType('.log')
    logs.sort()
    return master+[log for llevel, log in logs
                           if llevel>mlevel]
Example: Key/value stores (reading)
## Read in a single file
def update(result, fp):
    for line in fp:
        val = json.loads(line)
        if val[0] == 'add':
            result[val[1]] = val[2]
        else:
            del result[val[1]]

## Read in several files
def read(files):
    result = dict()
    for fname in files:
        try:
             update(result, file(fname))
except ValueError:
        pass
return result
Example: Key/value stores (writer
               class)
class Writer(object):
    def __init__(self, level):
        self.level = level
        self.fp = None
        self._next()
    def _next(self):
        self.level += 1
        if self.fp:
            self.fp.close()
        name ='%3d.log' % self.currentLevel
        self.fp = file(name, 'w')
        self.rows = 0
    def write(self, value):
print >>self.fp, json.dumps(value)
self.fp.flush()
self.rows += 1
if self.rows>200:
    self._next()
Example: Key/value stores (storage
               class)
## The actual data store abstraction.
class Store(object):
    def __init__(self):
        files = relevant(d)
        self.result = read(files)
        level = getLevel(files[-1])
        self.writer = Writer(level)
    def get(self, key):
        return self.result[key]
    def add(self, key, value):
        self.writer.write(['add', key, value])
    def remove(self, key):
        self.writer.write(['remove', key])
Example: Key/value stores
            (compression code)
## This should be run periodically
# from a different thread
def compress(d):
    files = relevant(d)[:-1]
    if len(files)<2:
        return
    result = read(files)
    master = getLevel(files[-1])+1
    fp = file('%3d.master.tmp' % master, 'w')
    for key, value in result.iteritems():
        towrite = ['add', key, value])
        print >>fp, json.dumps(towrite)
    fp.close()
Vertical splitting: Example
def forking_server():
    s = socket.socket()
    s.bind(('', 8080))
    s.listen(5)
    while True:
        client = s.accept()
        newpid = os.fork()
        if newpid:
            f = client.makefile()
            f.write("Sunday, May 22, 1983 "
                    "18:45:59-PST")
            f.close()
            os._exit()
Horizontal splitting: front-end
## Process one
class SchedulerResource(resource.Resource):
    isLeaf = True
    def __init__(self, filepath):
        resource.Resource.__init__(self)
        self.filepath = filepath
    def render_PUT(self, request):
        uuid, = request.postpath
        content = request.content.read()
        child = self.filepath.child(uuid)
        child.setContent(content)
fp = filepath.FilePath("things")
r = SchedulerResource(fp)
s = server.Site(r)
reactor.listenTCP(8080, s)
Horizontal splitting: scheduler
## Process two
rs = redis.Redis(host='localhost',
                  port=6379, db=9)
while True:
    for fname in os.listdir("things"):
        when = int(file(fname).read().strip())
        rs.set(uuid+':due', when)
        rs.sadd('scheduled', uuid)
        os.remove(fname)
    time.sleep(1)
Horizontal splitting: runner
## Process three
rs = redis.Redis(host='localhost',
                  port=6379, db=9)
recover()
while True:
    activate_due()
    time.sleep(1)
Horizontal splitting: message
           queues
     No direct dependencies
Horizontal splitting: message
            queues: sender
## Process four
rs = redis.Redis(host='localhost',
                 port=6379, db=9)
params = pika.ConnectionParameters('localhost')
conn = pika.BlockingConnection(params)
channel = conn.channel()
channel.queue_declare(queue='active')
while True:
    activated = rs.smembers('activated')
    finished = set(rs.smembers('finished'))
    for el in activated:
        if el in finished:
            continue
channel.basic_publish(
    exchange='', routing_key='active',
    body=el)
rs.add('finished', el)
Horizontal splitting: message
            queues: receiver
## Process five
# It is possible to get "dups" of bodies.
# Application logic should deal with that
params = pika.ConnectionParameters('localhost')
conn = pika.BlockingConnection(params)
channel = conn.channel()
channel.queue_declare(queue='active')
def callback(ch, method, properties, el):
    syslog.syslog('Activated %s' % el)
channel.basic_consume(callback, queue='hello', no_ack=True)
channel.start_consuming()
Horizontal splitting: point-to-point
      Use HTTP (preferably, REST)

More Related Content

What's hot

Ansible for Beginners
Ansible for BeginnersAnsible for Beginners
Ansible for Beginners
Arie Bregman
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy Systems
Mathias Herberts
 
Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Kiyotaka Oku
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Jason Kim
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)
MongoDB
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with Linux
Soumen Santra
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)
Jacek Laskowski
 
Python profiling
Python profilingPython profiling
Python profiling
dreampuf
 
Assignment no39
Assignment no39Assignment no39
Assignment no39Jay Patel
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced Replication
MongoDB
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
source{d}
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder
Yoshiro Tokumasu
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
Yoshiro Tokumasu
 
The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185
Mahmoud Samir Fayed
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
exsuns
 
The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189
Mahmoud Samir Fayed
 

What's hot (20)

Ansible for Beginners
Ansible for BeginnersAnsible for Beginners
Ansible for Beginners
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy Systems
 
Ns2programs
Ns2programsNs2programs
Ns2programs
 
Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Hadoop
HadoopHadoop
Hadoop
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with Linux
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)
 
Python profiling
Python profilingPython profiling
Python profiling
 
Assignment no39
Assignment no39Assignment no39
Assignment no39
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced Replication
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
 
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
 
The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
How To Recoord
How To RecoordHow To Recoord
How To Recoord
 
The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189
 

Viewers also liked

My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3David Sommer
 
Strategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationStrategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful Localization
John Collins
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Internationalization in Rails 2.2
Internationalization in Rails 2.2Internationalization in Rails 2.2
Internationalization in Rails 2.2
Nicolas Jacobeus
 
Sample of instructions
Sample of instructionsSample of instructions
Sample of instructionsDavid Sommer
 
Designing for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsDesigning for Multiple Mobile Platforms
Designing for Multiple Mobile Platforms
Robert Douglas
 
2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentaryalghanim
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsDavid Sommer
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with Rails
HeatherRivers
 
mobile development platforms
mobile development platformsmobile development platforms
mobile development platforms
guestfa9375
 
Sample email submission
Sample email submissionSample email submission
Sample email submissionDavid Sommer
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web appsiapain
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)
John Collins
 
The ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThe ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj Kumar
ThoughtWorks
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any Language
John Collins
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
John Collins
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
John Collins
 

Viewers also liked (20)

My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3
 
Glossary
GlossaryGlossary
Glossary
 
Strategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationStrategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful Localization
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Internationalization in Rails 2.2
Internationalization in Rails 2.2Internationalization in Rails 2.2
Internationalization in Rails 2.2
 
Sample of instructions
Sample of instructionsSample of instructions
Sample of instructions
 
Designing for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsDesigning for Multiple Mobile Platforms
Designing for Multiple Mobile Platforms
 
2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kits
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with Rails
 
Shrunken Head
 Shrunken Head  Shrunken Head
Shrunken Head
 
mobile development platforms
mobile development platformsmobile development platforms
mobile development platforms
 
Silmeyiniz
SilmeyinizSilmeyiniz
Silmeyiniz
 
Sample email submission
Sample email submissionSample email submission
Sample email submission
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web apps
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)
 
The ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThe ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj Kumar
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any Language
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
 

Similar to Make Sure Your Applications Crash

3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib웅식 전
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
Henry Schreiner
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
Muthu Vinayagam
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Python
kwatch
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsagniklal
 
Linux cheat sheet
Linux cheat sheetLinux cheat sheet
Linux cheat sheet
Pinaki Mahata Mukherjee
 
Python Asíncrono - Async Python
Python Asíncrono - Async PythonPython Asíncrono - Async Python
Python Asíncrono - Async Python
Javier Abadía
 
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
Yashpatel821746
 
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Yashpatel821746
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
Yashpatel821746
 
python codes
python codespython codes
python codes
tusharpanda88
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
MRATUNJAI TIWARI
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
Jogesh Rao
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in python
Karin Lagesen
 
Can you fix the errors- It isn't working when I try to run import s.pdf
Can you fix the errors- It isn't working when I try to run    import s.pdfCan you fix the errors- It isn't working when I try to run    import s.pdf
Can you fix the errors- It isn't working when I try to run import s.pdf
aksachdevahosymills
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+ConFoo
 
Commit2015 kharchenko - python generators - ext
Commit2015   kharchenko - python generators - extCommit2015   kharchenko - python generators - ext
Commit2015 kharchenko - python generators - ext
Maxym Kharchenko
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJS
Adam L Barrett
 
Terminal linux commands_ Fedora based
Terminal  linux commands_ Fedora basedTerminal  linux commands_ Fedora based
Terminal linux commands_ Fedora based
Navin Thapa
 

Similar to Make Sure Your Applications Crash (20)

Five
FiveFive
Five
 
3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Python
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsag
 
Linux cheat sheet
Linux cheat sheetLinux cheat sheet
Linux cheat sheet
 
Python Asíncrono - Async Python
Python Asíncrono - Async PythonPython Asíncrono - Async Python
Python Asíncrono - Async Python
 
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
 
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
 
python codes
python codespython codes
python codes
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in python
 
Can you fix the errors- It isn't working when I try to run import s.pdf
Can you fix the errors- It isn't working when I try to run    import s.pdfCan you fix the errors- It isn't working when I try to run    import s.pdf
Can you fix the errors- It isn't working when I try to run import s.pdf
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
Commit2015 kharchenko - python generators - ext
Commit2015   kharchenko - python generators - extCommit2015   kharchenko - python generators - ext
Commit2015 kharchenko - python generators - ext
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJS
 
Terminal linux commands_ Fedora based
Terminal  linux commands_ Fedora basedTerminal  linux commands_ Fedora based
Terminal linux commands_ Fedora based
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

Make Sure Your Applications Crash

  • 1. Make Sure Your Applications Crash Moshe Zadka
  • 3. Python doesn't crash Memory managed, no direct pointer arithmetic
  • 4. ...except it does C bugs, untrapped exception, infinite loops, blocking calls, thread dead-lock, inconsistent resident state
  • 5. Recovery is important "[S]ystem failure can usually be considered to be the result of two program errors[...] the second, in the recovery routine[...]"
  • 6. Crashes and inconsistent data A crash results in data from an arbitrary program state.
  • 7. Avoid storage Caches are better than master copies.
  • 9. Atomic operations File rename
  • 10. Example: Counting def update_counter(): fp = file("counter.txt") s = fp.read() counter = int(s.strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("counter.txt.tmp", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified # The following is an atomic operation os.rename("counter.txt.tmp", "counter.txt")
  • 11. Efficient caches, reliable masters Mark inconsistency of cache
  • 13. Availability If data is consistent, just restart!
  • 14. Improving availability Limit impact Fast detection Fast start-up
  • 15. Vertical splitting Different execution paths, different processes
  • 16. Horizontal splitting Different code bases, different processes
  • 17. Watchdog Monitor -> Flag -> Remediate
  • 18. Watchdog principles Keep it simple, keep it safe!
  • 19. Watchdog: Heartbeats ## In a Twisted process def beat(): file('beats/my-name', 'a').close() task.LoopingCall(beat).start(30)
  • 20. Watchdog: Get time-outs def getTimeout() timeout = dict() now = time.time() for heart in glob.glob('hearts/*'): beat = int(file(heart).read().strip()) timeout[heart] = now-beat return timeout
  • 21. Watchdog: Mark problems def markProblems(): timeout = getTimeout() for heart in glob.glob('beats/*'): mtime = os.path.getmtime(heart) problem = 'problems/'+heart if (mtime<timeout[heart] and not os.path.isfile(problem)): fp = file('problems/'+heart, 'w') fp.write('watchdog') fp.close()
  • 22. Watchdog: check solutions def checkSolutions(): now = time.time() problemTimeout = now-30 for problem in glob.glob('problems/*'): mtime = os.path.getmtime(problem) if mtime<problemTimeout: subprocess.call(['restart-system'])
  • 23. Watchdog: Loop ## Watchdog while True: markProblems() checkSolutions() time.sleep(1)
  • 24. Watchdog: accuracy of Custom checkers can manufacture problems
  • 25. Watchdog: reliability of Use cron for main loop
  • 26. Watchdog: reliability of Use software/hardware watchdogs
  • 29. Welcome to the back-up slides Extra! Extra!
  • 30. Example: Counting on Windows def update_counter(): fp = file("counter.txt") s = fp.read() counter = int(s.strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("counter.txt.tmp", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified os.remove("counter.txt") # At this point, the state is inconsistent* # The following is an atomic operation
  • 32. Example: Counting on Windows (Recovery) def recover(): if not os.path.exists("counter.txt"): # The permanent file has been removed # Therefore, the temp file is valid os.rename("counter.txt.tmp", "counter.txt")
  • 33. Example: Counting with versions def update_counter(): files = [int(name.split('.')[-1]) for name in os.listdir('.') if name.startswith('counter.')] last = max(files) counter = int(file('counter.%s' % last ).read().strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("tmp.counter", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified
  • 34. os.rename('tmp.counter', 'counter.%s' % (last+1)) os.remove('counter.%s' % last)
  • 35. Example: Counting with versions (cleanup) # This is not a recovery routine, but a cleanup # routine. # Even in its absence, the state is consistent def cleanup(): files = [int(name.split('.')[-1]) for name in os.listdir('.') if name.startswith('counter.')] files.sort() files.pop() for n in files: os.remove('counter.%d' % n) if os.path.exists('tmp.counter'): os.remove('tmp.counter')
  • 36. Correct ordering def activate_due(): scheduled = rs.smembers('scheduled') now = time.time() for el in scheduled: due = int(rs.get(el+':due')) if now<due: continue rs.sadd('activated', el) rs.delete(el+':due') rs.sremove('scheduled', el)
  • 37. Correct ordering (recovery) def recover(): inconsistent = rs.sinter('activated', 'scheduled') for el in inconsistent: rs.delete(el+':due') #* rs.sremove('scheduled', el)
  • 38. Example: Key/value stores 0.log: ['add', 'key-0', 'value-0'] ['add', 'key-1', 'value-1'] ['add', 'key-0', 'value-2'] ['remove', 'key-1'] . . . 1.log: . . . 2.log:
  • 39. . . .
  • 40. Example: Key/value stores (utility functions) ## Get the level of a file def getLevel(s) return int(s.split('.')[0]) ## Get all files of a given type def getType(tp): return [(getLevel(s), s) for s in files if s.endswith(tp)]
  • 41. Example: Key/value stores (classifying files) ## Get all relevant files def relevant(d): files = os.listdir(d): mlevel, master = max(getType('.master')) logs = getType('.log') logs.sort() return master+[log for llevel, log in logs if llevel>mlevel]
  • 42. Example: Key/value stores (reading) ## Read in a single file def update(result, fp): for line in fp: val = json.loads(line) if val[0] == 'add': result[val[1]] = val[2] else: del result[val[1]] ## Read in several files def read(files): result = dict() for fname in files: try: update(result, file(fname))
  • 43. except ValueError: pass return result
  • 44. Example: Key/value stores (writer class) class Writer(object): def __init__(self, level): self.level = level self.fp = None self._next() def _next(self): self.level += 1 if self.fp: self.fp.close() name ='%3d.log' % self.currentLevel self.fp = file(name, 'w') self.rows = 0 def write(self, value):
  • 46. Example: Key/value stores (storage class) ## The actual data store abstraction. class Store(object): def __init__(self): files = relevant(d) self.result = read(files) level = getLevel(files[-1]) self.writer = Writer(level) def get(self, key): return self.result[key] def add(self, key, value): self.writer.write(['add', key, value]) def remove(self, key): self.writer.write(['remove', key])
  • 47. Example: Key/value stores (compression code) ## This should be run periodically # from a different thread def compress(d): files = relevant(d)[:-1] if len(files)<2: return result = read(files) master = getLevel(files[-1])+1 fp = file('%3d.master.tmp' % master, 'w') for key, value in result.iteritems(): towrite = ['add', key, value]) print >>fp, json.dumps(towrite) fp.close()
  • 48. Vertical splitting: Example def forking_server(): s = socket.socket() s.bind(('', 8080)) s.listen(5) while True: client = s.accept() newpid = os.fork() if newpid: f = client.makefile() f.write("Sunday, May 22, 1983 " "18:45:59-PST") f.close() os._exit()
  • 49. Horizontal splitting: front-end ## Process one class SchedulerResource(resource.Resource): isLeaf = True def __init__(self, filepath): resource.Resource.__init__(self) self.filepath = filepath def render_PUT(self, request): uuid, = request.postpath content = request.content.read() child = self.filepath.child(uuid) child.setContent(content) fp = filepath.FilePath("things") r = SchedulerResource(fp) s = server.Site(r) reactor.listenTCP(8080, s)
  • 50. Horizontal splitting: scheduler ## Process two rs = redis.Redis(host='localhost', port=6379, db=9) while True: for fname in os.listdir("things"): when = int(file(fname).read().strip()) rs.set(uuid+':due', when) rs.sadd('scheduled', uuid) os.remove(fname) time.sleep(1)
  • 51. Horizontal splitting: runner ## Process three rs = redis.Redis(host='localhost', port=6379, db=9) recover() while True: activate_due() time.sleep(1)
  • 52. Horizontal splitting: message queues No direct dependencies
  • 53. Horizontal splitting: message queues: sender ## Process four rs = redis.Redis(host='localhost', port=6379, db=9) params = pika.ConnectionParameters('localhost') conn = pika.BlockingConnection(params) channel = conn.channel() channel.queue_declare(queue='active') while True: activated = rs.smembers('activated') finished = set(rs.smembers('finished')) for el in activated: if el in finished: continue
  • 54. channel.basic_publish( exchange='', routing_key='active', body=el) rs.add('finished', el)
  • 55. Horizontal splitting: message queues: receiver ## Process five # It is possible to get "dups" of bodies. # Application logic should deal with that params = pika.ConnectionParameters('localhost') conn = pika.BlockingConnection(params) channel = conn.channel() channel.queue_declare(queue='active') def callback(ch, method, properties, el): syslog.syslog('Activated %s' % el) channel.basic_consume(callback, queue='hello', no_ack=True) channel.start_consuming()
  • 56. Horizontal splitting: point-to-point Use HTTP (preferably, REST)