• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Fabric, Cuisine and Watchdog for server administration in Python
 

Fabric, Cuisine and Watchdog for server administration in Python

on

  • 46,985 views

Presents Fabric, Cuisine and Watchdog, three Python tools that will help you setup, administer and monitor your servers.

Presents Fabric, Cuisine and Watchdog, three Python tools that will help you setup, administer and monitor your servers.

Statistics

Views

Total Views
46,985
Views on SlideShare
44,365
Embed Views
2,620

Actions

Likes
129
Downloads
856
Comments
12

29 Embeds 2,620

http://d.hatena.ne.jp 1242
http://blog.kzfmix.com 391
http://inercia.tumblr.com 306
http://www.scoop.it 204
http://www.redditmedia.com 153
http://join5works.com 100
https://twitter.com 68
http://paper.li 29
http://5works.co 15
http://www.techgig.com 14
https://twimg0-a.akamaihd.net 12
https://www.google.co.jp 12
http://preaks.weebly.com 11
url_unknown 11
http://zootool.com 8
http://a0.twimg.com 8
https://si0.twimg.com 6
http://fig.ly 5
http://twitter.com 5
http://www.linkedin.com 4
http://www.pearltrees.com 3
http://www.weebly.com 2
http://translate.googleusercontent.com 2
http://webcache.googleusercontent.com 2
http://akkumulation.posterous.com 2
http://thinkery.me 2
http://www.docseek.net 1
http://192.168.245.161 1
http://trunk.ly 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

110 of 12 previous next Post a comment

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • fabric + cuisine it is.....
    Are you sure you want to
    Your message goes here
    Processing…
  • OK I'm sold. No bloat, doing things done and nothing more...
    Are you sure you want to
    Your message goes here
    Processing…
  • Nice alternative to ruby-ish tools
    Are you sure you want to
    Your message goes here
    Processing…
  • And at slide 78, standart ’format’ method of ’string’ could be used:

    ’’’
    cd {PATH}
    exec {EXEC_PATH}
    ’’’.format(
    PATH=’some-path’,
    EXEC_PATH=’another-path'
    )

    Which is more clear in my taste.
    Are you sure you want to
    Your message goes here
    Processing…
  • Hey, at slide 77 you can replace text_strip_margin function with standart textwrap.dedent. It does the same thing, but without need these separators!
    Are you sure you want to
    Your message goes here
    Processing…

110 of 12 previous next

Post Comment
Edit your comment

    Fabric, Cuisine and Watchdog for server administration in Python Fabric, Cuisine and Watchdog for server administration in Python Presentation Transcript

    • Fabric, Cuisine & Watchdog Sébastien Pierre, ffunction inc.@Montréal Python, February 2011 www.ffctn.com ffunction inc.
    • How to use Python forServer Administration Thanks to Fabric Cuisine* & Watchdog* *custom tools ffunction inc.
    • The way we use servers has changed ffunction inc.
    • The era of dedicated serversHosted in your server room or in colocation WEB DATABASE EMAIL SERVER SERVER SERVER ffunction inc.
    • The era of dedicated serversHosted in your server room or in colocation WEB DATABASE EMAIL SERVER SERVER SERVER Sysadmins typically Sysadmins typically SSH and configure SSH and configure the servers live the servers live ffunction inc.
    • The era of dedicated serversHosted in your server room or in colocation WEB DATABASE EMAIL SERVER SERVER SERVER The servers are The servers are conservatively managed, conservatively managed, updates are risky updates are risky ffunction inc.
    • The era of slices/VPSLinode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9 We now have multiple We now have multiple small virtual servers small virtual servers (slices/VPS) (slices/VPS) ffunction inc.
    • The era of slices/VPSLinode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9 Often located in different Often located in different data-centers data-centers ffunction inc.
    • The era of slices/VPSLinode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9 ...and sometimes with ...and sometimes with different providers different providers ffunction inc.
    • The era of slices/VPSLinode.com Amazon Ec2 SLICESLICE SLICE 1 1 1 SLICE 1 SLICESLICE 6 1 SLICE SLICE 11 10 SLICE 9IWeb.com We even sometimes DEDICATED DEDICATED We even sometimes still have physical, SERVER 1 SERVER 2 still have physical, dedicated servers dedicated servers ffunction inc.
    • The challengeORDER SETUP DEPLOYSERVER SERVER APPLICATION ffunction inc.
    • The challengeORDER SETUP DEPLOYSERVER SERVER APPLICATION MAKE THIS PROCESS AS FAST (AND SIMPLE) AS POSSIBLE ffunction inc.
    • The challenge Create users, groups Create users, groups Customize config files Customize config files Install base packages Install base packagesORDER SETUP DEPLOYSERVER SERVER APPLICATION MAKE THIS PROCESS AS FAST (AND SIMPLE) AS POSSIBLE ffunction inc.
    • The challenge Install app-specific Install app-specific packages packages deploy application deploy application start services start servicesORDER SETUP DEPLOYSERVER SERVER APPLICATION MAKE THIS PROCESS AS FAST (AND SIMPLE) AS POSSIBLE ffunction inc.
    • The challenge ffunction inc.
    • The challenge Quickly integrate your Quickly integrate your new server in the new server in the existing architecture existing architecture ffunction inc.
    • The challenge ...and make sure ...and make sure its running! its running! ffunction inc.
    • Todays menu Interact with your remote machines FABRIC as if they were local Takes care of users, group, packages CUISINE and configuration of your new machine Ensures that your servers and servicesWATCHDOG are up and running ffunction inc.
    • Todays menu Interact with your remote machines FABRIC as if they were local Takes care of users, group, packages CUISINE Made by Made by and configuration of your new machine Ensures that your servers and servicesWATCHDOG are up and running ffunction inc.
    • Part 1 Fabric - http://fabfile.orgapplication deployment & systems administration tasks ffunction inc.
    • Fabric is a Python library and command-line toolfor streamlining the use of SSH for application deployment or systems administration tasks. ffunction inc.
    • Wait... what does Wait... what does that mean ? that mean ? Fabric is a Python library and command-line toolfor streamlining the use of SSH for application deployment or systems administration tasks. ffunction inc.
    • Streamlining SSHBy hand:version = os.popen(“ssh myserver cat /proc/version”).read()Using Fabric:version = run(“cat /proc/version”) ffunction inc.
    • Streamlining SSHBy hand:version = os.popen(“ssh myserver cat /proc/version).read()Using Fabric:from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”) ffunction inc.
    • Streamlining SSHBy hand: You can specify You can specify multiple hosts and runversion = os.popen(“ssh myserver cat run multiple hosts and /proc/version).read() the same commands the same commands across them across themUsing Fabric:from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”) ffunction inc.
    • Streamlining SSHBy hand:version = os.popen(“ssh myserver cat /proc/version).read() Connections will be Connections will be lazily created and lazily created and pooled pooledUsing Fabric:from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”) ffunction inc.
    • Streamlining SSHBy hand:version = os.popen(“ssh myserver cat /proc/version).read()Using Fabric:from fabric.api import *env.hosts = [“myserver”]version = run(“cat /proc/version”) Failures ($STATUS) will Failures ($STATUS) will be handled just like in Make be handled just like in Make ffunction inc.
    • Example: Installing packagessudo(“aptitude install nginx”)if run("dpkg -s %s | grep Status: ; true" %package).find("installed") == -1: sudo("aptitude install %s" % (package) ffunction inc.
    • Example: Installing packagessudo(“aptitude install nginx”) Its easy to take action Its easy to take action depending on the result depending on the resultif run("dpkg -s %s | grep Status: ; true" %package).find("installed") == -1: sudo("aptitude install %s" % (package) ffunction inc.
    • Example: Installing packages Note that we add true Note that we add truesudo(“aptitude install nginx”) so that the run() always so that the run() always succeeds* succeeds* * there are other ways... * there are other ways...if run("dpkg -s %s | grep Status: ; true" %package).find("installed") == -1: sudo("aptitude install %s" % (package) ffunction inc.
    • Example: retrieving system statusdisk_usage = run(“df -kP”)mem_usage = run(“cat /proc/meminfo”)cpu_usage = run(“cat /proc/stat”print disk_usage, mem_usage, cpu_info ffunction inc.
    • Example: retrieving system statusdisk_usage = run(“df -kP”)mem_usage = run(“cat /proc/meminfo”)cpu_usage = run(“cat /proc/stat”print disk_usage, mem_usage, cpu_info Very useful for getting Very useful for getting live information from live information from many different servers many different servers ffunction inc.
    • Fabfile.pyfrom fabric.api import *from mysetup import *env.host = [“server1.myapp.com”]def setup(): install_packages(“...”) update_configuration() create_users() start_daemons()$ fab setup ffunction inc.
    • Fabfile.pyfrom fabric.api import *from mysetup import *env.host = [“server1.myapp.com”]def setup(): install_packages(“...”) update_configuration() create_users() start_daemons() Just like Make, you Just like Make, you write rules that do write rules that do something something$ fab setup ffunction inc.
    • Fabfile.pyfrom fabric.api import *from mysetup import *env.host = [“server1.myapp.com”]def setup(): install_packages(“...”) update_configuration() ...and you can specify create_users() ...and you can specify on which servers the rules start_daemons() on which servers the rules will run will run$ fab setup ffunction inc.
    • Multiple hostsenv.hosts = [ “db1.myapp.com”, “db2.myapp.com”, “db3.myapp.com”]@hosts(“db1.myapp”)def backup_db(): run(...) ffunction inc.
    • Rolesenv.roledefs = { web: [www1, www2, www3], dns: [ns1, ns2]}$ fab -R web setup ffunction inc.
    • Rolesenv.roledefs = { web: [www1, www2, www3], dns: [ns1, ns2]}$ fab -R web setup Will run the setup rule Will run the setup rule only on hosts members only on hosts members of the web role. of the web role. ffunction inc.
    • Whats good about Fabric?Low-levelBasically an ssh() command that returns the resultSimple primitivesrun(), sudo(), get(), put(), local(), prompt(), reboot()No magicNo DSL, no abstraction, just a remote command API ffunction inc.
    • What could be improved ?Ease common admin tasksUser, group creation. Files, directory operations.Abstract primitivesLike install package, so that it works with different OSTemplatesTo make creating/updating configuration files easy ffunction inc.
    • Cuisine:Chef-like functionality for Fabric ffunction inc.
    • Part 2Cuisine ffunction inc.
    • What is Opscodes Chef? http://wiki.opscode.com/display/chef/HomeRecipesScripts/packages to install and configure services andapplicationsAPIA DSL-like Ruby API to interact with the OS (createusers, groups, install packages, etc)ArchitectureClient-server or “solo” mode to push and deploy yournew configurations ffunction inc.
    • What I liked about ChefFlexibleYou can use the API or shell commandsStructuredHelped me have a clear decomposition of the servicesinstalled per machineCommunityLots of recipes already available fromhttp://cookbooks.opscode.com/ ffunction inc.
    • What I didnt likeToo many files and directoriesCode is spread out, hard to get the big pictureAbstraction overloadAPI not very well documented, frequent fall backs toplain shell scripts within the recipeNo “smart” recipeRecipes are applied all the time, even when its notnecessary ffunction inc.
    • The question that kept coming... sudo aptitude install apache2 python django- pythonDjango recipe: 5 files, 2 directories What it does, in essence ffunction inc.
    • The question that kept coming... Is this really necessary Is this really necessary for what I want to do ? sudo aptitude install for what I want to do ? apache2 python django- pythonDjango recipe: 5 files, 2 directories What it does, in essence ffunction inc.
    • What I loved about FabricBare metalssh() function, simple and elegant set of primitivesNo magicNo abstraction, no model, no compilationTwo-way communicationEasy to change the rules behaviour according to theoutput (ex: do not install something thats alreadyinstalled) ffunction inc.
    • What I needed Fabric ffunction inc.
    • What I neededFile I/O File I/O Fabric ffunction inc.
    • What I needed User/Group User/GroupFile I/O File I/O Management Management Fabric ffunction inc.
    • What I needed User/Group User/Group Package PackageFile I/O File I/O Management Management Management Management Fabric ffunction inc.
    • What I needed Text processing & Templates Text processing & Templates User/Group User/Group Package PackageFile I/O File I/O Management Management Management Management Fabric ffunction inc.
    • How I wanted itSimple “flat” API[object]_[operation] where operation is something in “create”,“read”, “update”, “write”, “remove”, “ensure”, etc...Driven by needOnly implement a feature if I have a real need for itNo magicEverything is implemented using sh-compatible commandsNo unnecessary structureEverything fits in one file, no imposed file layout ffunction inc.
    • Cuisine: Example fabfile.pyfrom cuisine import *env.host = [“server1.myapp.com”]def setup(): package_ensure(“python”, “apache2”, “python-django”) user_ensure(“admin”, uid=2000) upstart_ensure(“django”)$ fab setup ffunction inc.
    • Cuisine:Fabrics coreimportedfabfile.py Example functions Fabrics core functions are already are already importedfrom cuisine import *env.host = [“server1.myapp.com”]def setup(): package_ensure(“python”, “apache2”, “python-django”) user_ensure(“admin”, uid=2000) upstart_ensure(“django”)$ fab setup ffunction inc.
    • Cuisine: Example fabfile.pyfrom cuisine import *env.host = [“server1.myapp.com”]def setup(): package_ensure(“python”, “apache2”, “python-django”) user_ensure(“admin”, uid=2000) upstart_ensure(“django”) Cuisines API$ fab setup Cuisines API calls calls ffunction inc.
    • File I/O ffunction inc.
    • Cuisine : File I/O● file_exists does remote file exists?● file_read reads remote file● file_write write data to remote file● file_append appends data to remote file● file_attribs chmod & chown● file_remove ffunction inc.
    • Cuisine : File I/O Supports owner/group● file_exists does remote file exists? Supports owner/group and mode change and mode change● file_read reads remote file● file_write write data to remote file● file_append appends data to remote file● file_attribs chmod & chown● file_remove ffunction inc.
    • Cuisine : File I/O (directories)● dir_exists does remote file exists?● dir_ensure ensures that a directory exists● dir_attribs chmod & chown● dir_remove ffunction inc.
    • Cuisine : File I/O +● file_update(location, updater=lambda _:_) package_ensure("mongodb-snapshot") def update_configuration( text ): res = [] for line in text.split("n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "n".join(res) file_update("/etc/mongodb.conf", update_configuration) ffunction inc.
    • Cuisine : File I/O + This replaces the values for This replaces the values for● file_update(location, updater=lambda _:_) configuration entries configuration entries dbpath and logpath dbpath and logpath package_ensure("mongodb-snapshot") def update_configuration( text ): res = [] for line in text.split("n"): if line.strip().startswith("dbpath="): res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "n".join(res) file_update("/etc/mongodb.conf", update_configuration) ffunction inc.
    • Cuisine : File I/O +● file_update(location, updater=lambda _:_) package_ensure("mongodb-snapshot") def update_configuration( text ): res = [] The remote file will only be The remote file line in text.split("n"): for will only be changed if the content changed if the content if line.strip().startswith("dbpath="): is different is different res.append("dbpath=/data/mongodb") elif line.strip().startswith("logpath="): res.append("logpath=/data/logs/mongodb.log") else: res.append(line) return "n".join(res) file_update("/etc/mongodb.conf", update_configuration) ffunction inc.
    • User Management ffunction inc.
    • Cuisine: User Management● user_exists does the user exists?● user_create create the user● user_ensure create the user if it doesnt exist ffunction inc.
    • Cuisine: Group Management● group_exists does the group exists?● group_create create the group● group_ensure create the group if it doesnt exist● group_user_exists does the user belong to the group?● group_user_add adds the user to the group● group_user_ensure ffunction inc.
    • Package Management ffunction inc.
    • Cuisine: Package Management● package_exists is the package available ?● package_installed is it installed ?● package_install install the package● package_ensure ... only if its not installed● package_upgrade upgrades the/all package(s) ffunction inc.
    • Text & Templates ffunction inc.
    • Cuisine: Text transformationtext_ensure_line(text, lines)file_update( "/home/user/.profile", lambda _:text_ensure_line(_, "PYTHONPATH=/opt/lib/python:${PYTHONPATH};" "export PYTHONPATH")) ffunction inc.
    • Cuisine: Text transformation Ensures that the PYTHONPATH Ensures that the PYTHONPATH variable is set and exported,text_ensure_line(text, lines) variable is set and exported, If not, these lines will be If not, these lines will be appended. appended.file_update( "/home/user/.profile", lambda _:text_ensure_line(_, "PYTHONPATH=/opt/lib/python:${PYTHONPATH};" "export PYTHONPATH")) ffunction inc.
    • Cuisine: Text transformationtext_replace_line(text, old, new, find=.., process=...)configuration = local_read("server.conf")for key, value in variables.items(): configuration, replaced = text_replace_line( configuration, key + "=", key + "=" + repr(value), process=lambda text:text.split("=")[0].strip() ) ffunction inc.
    • Cuisine: Text transformation Replaces lines that look like Replaces lines that look like VARIABLE=VALUEtext_replace_line(text, old, new, find=.., process=...) VARIABLE=VALUE with the actual values from the with the actual values from the variables dictionary. variables dictionary.configuration = local_read("server.conf")for key, value in variables.items(): configuration, replaced = text_replace_line( configuration, key + "=", key + "=" + repr(value), process=lambda text:text.split("=")[0].strip() ) ffunction inc.
    • Cuisine: Text transformationtext_replace_line(text, old, new, find=..,process lambda transforms The process=...) The process lambda transforms input lines before comparing input lines before comparing them. them.configuration = local_read("server.conf")lines are stripped Here the Here the lines are strippedfor key, value in variables.items(): of spaces and of their value. of spaces and of their value. configuration, replaced = text_replace_line( configuration, key + "=", key + "=" + repr(value), process=lambda text:text.split("=")[0].strip() ) ffunction inc.
    • Cuisine: Text transformationtext_strip_margin(text)file_write(".profile", text_strip_margin( """ |export PATH="$HOME/bin":$PATH |set -o vi """)) ffunction inc.
    • Cuisine: Text transformation Everything after the | separator Everything after the | separator will be output as content. will be output as content.text_strip_margin(text) It allows to easily embed text It allows to easily embed text templates within functions. templates within functions.file_write(".profile", text_strip_margin( """ |export PATH="$HOME/bin":$PATH |set -o vi """)) ffunction inc.
    • Cuisine: Text transformationtext_template(text, variables)text_template(text_strip_margin( """ |cd ${DAEMON_PATH} |exec ${DAEMON_EXEC_PATH} """), dict( DAEMON_PATH="/opt/mongodb", DAEMON_EXEC_PATH="/opt/mongodb/mongod")) ffunction inc.
    • Cuisine: Text transformation This is a simple wrappertext_template(text, variables) This is a simple wrapper around Python (safe) around Python (safe) string.template() function string.template() functiontext_template(text_strip_margin( """ |cd ${DAEMON_PATH} |exec ${DAEMON_EXEC_PATH} """), dict( DAEMON_PATH="/opt/mongodb", DAEMON_EXEC_PATH="/opt/mongodb/mongod")) ffunction inc.
    • Cuisine: Goodies● ssh_keygen generates DSA keys● ssh_authorize authorizes your key on the remote server● mode_sudo run() always uses sudo● upstart_ensure ensures the given daemon is running & more! ffunction inc.
    • Why use Cuisine ?● Simple API for remote-server manipulation Files, users, groups, packages● Shell commands for specific tasks only Avoid problems with your shell commands by only using run() for very specific tasks● Cuisine tasks are not stupid *_ensure() commands wont do anything if its not necessary ffunction inc.
    • Limitations● Limited to sh-shells Operations will not work under csh● Only written/tested for Ubuntu Linux Contributors could easily port commands ffunction inc.
    • Get started ! On Github:http://github.com/sebastien/cuisine 1 short Python file Documented API ffunction inc.
    • Part 3 WatchdogServer and services monitoring ffunction inc.
    • The problem ffunction inc.
    • The problemLow disk space Low disk space ffunction inc.
    • The problemArchive files Archive filesRotate logs Rotate logsPurge cache Purge cache ffunction inc.
    • The problem HTTP server HTTP server has high has high latency latency ffunction inc.
    • The problem Restart HTTP Restart HTTP server server ffunction inc.
    • The problem System load System load is too high is too high ffunction inc.
    • The problem re-nice re-nice important important processes processes ffunction inc.
    • We want to be notifiedwhen incidents happen ffunction inc.
    • We want automatic actions to be taken whenever possible ffunction inc.
    • (Some of the) existing solutionsMonit, God, Supervisord, UpstartFocus on starting/restarting daemons andservicesMunin, CactiFocus on visualization of RRDTool dataCollectdFocus on collecting and publishing data ffunction inc.
    • The ideal toolWide spectrumData collection, service monitoring, actionsEasy setup and deploymentNo complex installation or configurationFlexible server architectureCan monitor local or remote processesCustomizable and extensibleFrom restarting deamons to monitoring wholeservers ffunction inc.
    • Hello, Watchdog! SERVICE ffunction inc.
    • Hello, Watchdog! SERVICE RULE ffunction inc.
    • Hello, Watchdog! A service is a A service is a collection of collection of RULES RULES SERVICE RULE ffunction inc.
    • Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O Bandwidth ffunction inc.
    • Hello, Watchdog! SERVICE Each rule retrieves Each rule retrievesdata and processes it. HTTP Request data and processes it. Rules can SUCCEED RULE CPU, Disk, Mem % Rules can SUCCEED or FAIL Process status or FAIL I/O Bandwidth ffunction inc.
    • Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O Bandwidth ACTION ffunction inc.
    • Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O Bandwidth Logging XMPP, Email notifications ACTION Start/stop process …. ffunction inc.
    • Hello, Watchdog! SERVICE HTTP Request RULE CPU, Disk, Mem % Process status I/O BandwidthActions are bound Actions are bound Loggingto rule, triggered to rule, triggeredon rule SUCCESS XMPP, Email notifications on rule SUCCESS ACTION or FAILURE Start/stop process or FAILURE …. ffunction inc.
    • Execution ModelMONITOR ffunction inc.
    • Execution Model SERVICE DEFINITION RULEMONITOR (frequency in ms) ffunction inc.
    • Services are registered Services are registered Execution Model in the monitor in the monitor SERVICE DEFINITION RULE MONITOR (frequency in ms) ffunction inc.
    • Execution Model Rules defined in the Rules defined in the service are executed service are executed every N ms every N ms (frequency) SERVICE DEFINITION (frequency) RULEMONITOR (frequency in ms) ffunction inc.
    • Execution Model SERVICE DEFINITION RULEMONITOR (frequency in ms) SUCCESS FAILURE ACTION ACTION ACTION ffunction inc.
    • Execution Model SERVICE DEFINITION RULEMONITOR (frequency in ms) SUCCESS FAILURE ACTION ACTION ACTION If the rule SUCCEEDS If the rule SUCCEEDS actions will be actions will be sequentially executed sequentially executed ffunction inc.
    • Execution Model SERVICE DEFINITION RULEMONITOR (frequency in ms) SUCCESS FAILURE ACTION ACTION ACTION If the rule FAIL If the rule FAIL failure actions will be failure actions will be sequentially executed sequentially executed ffunction inc.
    • Monitoring a remote machine#!/usr/bin/env pythonfrom watchdog import *Monitor( Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) )).run() ffunction inc.
    • Monitoring a remote machine A monitor is like the A monitor is like the “main” for Watchdog.#!/usr/bin/env python “main” for Watchdog. It actively monitorsfrom watchdog import * It actively monitorsMonitor( services. services. Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) )).run() ffunction inc.
    • Monitoring a remote machine#!/usr/bin/env pythonfrom watchdog import *Monitor( Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) )).run() Dont forget to call Dont forget to call run() on it run() on it ffunction inc.
    • Monitoring a remote machine#!/usr/bin/env python The service monitorsfrom watchdog import * The service monitors the rulesMonitor( the rules Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) )).run() ffunction inc.
    • Monitoring a remote machine#!/usr/bin/env pythonfrom watchdog import * The HTTP rule The HTTP ruleMonitor( allows to test allows to test Service( an URL name = "google-search-latency", an URL monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) ) And we display a And we display a).run() message in case message in case of failure of failure ffunction inc.
    • Monitoring a remote machine#!/usr/bin/env pythonfrom watchdog import *Monitor( Service( name = "google-search-latency", monitor = ( HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Print("Google search query took more than 50ms") ] ) ) If it there is a 4XX or ) If it there is a 4XX or it timeouts, the rule).run() it timeouts, the rule will fail and display will fail and display an error message an error message ffunction inc.
    • Monitoring a remote machine$ python example-service-monitoring.py2011-02-27T22:33:18 watchdog --- #0 (runners=1,threads=2,duration=0.57s)2011-02-27T22:33:18 watchdog [!] Failure on HTTP(GET="www.google.ca:80/search?q=watchdog",timeout=0.08) : Socket error: timed outGoogle search query took more than 50ms2011-02-27T22:33:19 watchdog --- #1 (runners=1,threads=2,duration=0.73s)2011-02-27T22:33:20 watchdog --- #2 (runners=1,threads=2,duration=0.54s)2011-02-27T22:33:21 watchdog --- #3 (runners=1,threads=2,duration=0.69s)2011-02-27T22:33:22 watchdog --- #4 (runners=1,threads=2,duration=0.77s)2011-02-27T22:33:23 watchdog --- #5 (runners=1,threads=2,duration=0.70s) ffunction inc.
    • Sending Email Notificationsend_email = Email( "notifications@ffctn.com", "[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword")[…]HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ send_email ]) ffunction inc.
    • Sending Email Notificationsend_email = Email( "notifications@ffctn.com", "[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword")[…]HTTP( The Email rule will send GET="http://www.google.ca/search?q=watchdog", to send The Email rule will an email freq=Time.s(1), an email to notifications@ffctn.com timeout=Time.ms(80), notifications@ffctn.com when triggered fail=[ when triggered send_email ]) ffunction inc.
    • Sending Email Notificationsend_email = Email( "notifications@ffctn.com", "[Watchdog]Google Search Latency Error", "Latency was over 80ms", "smtp.gmail.com", "myusername", "mypassword")[…]HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ send_email ]) This is how we bind the This is how we bind the action to the rule failure action to the rule failure ffunction inc.
    • Sending Email+Jabber Notificationsend_xmpp = XMPP( "notifications@jabber.org", "Watchdog: Google search latency over 80ms", "myuser@jabber.org", "myspassword")[…]HTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ send_email, send_xmpp ]) ffunction inc.
    • Monitoring incident: when somethingfails repeatedly during a given period of time ffunction inc.
    • Monitoring incident: when somethingfails repeatedly during a given period of time You dont want to be You dont want to be notified all the time, notified all the time, only when it really only when it really matters. matters. ffunction inc.
    • Detecting incidentsHTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ]) ffunction inc.
    • Detecting incidents An incident is a “smart” An incident is a “smart” action : it will only do action : it will only do something when theHTTP( something when the condition is met GET="http://www.google.ca/search?q=watchdog", condition is met freq=Time.s(1), timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ]) ffunction inc.
    • Detecting incidentsHTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), When at least 5 errors... When at least 5 errors... timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ]) ffunction inc.
    • Detecting incidentsHTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), ...happen over a 10 ...happen over a 10 fail=[ seconds period seconds period Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ]) ffunction inc.
    • Detecting incidentsHTTP( GET="http://www.google.ca/search?q=watchdog", freq=Time.s(1), timeout=Time.ms(80), fail=[ Incident( errors = 5, during = Time.s(10), actions = [send_email,send_xmpp] ) ]) The Incident action will The Incident action will trigger the given actions trigger the given actions ffunction inc.
    • Example: Ensuring a service is runningfrom watchdog import *Monitor( Service( name="myservice-ensure-up", monitor=( HTTP( GET="http://localhost:8000/", freq=Time.ms(500), fail=[ Incident( errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py")])] )))).run() ffunction inc.
    • Example: Ensuring a service is runningfrom watchdog import * We test if we can We test if we canMonitor( GET http://localhost:8000 GET http://localhost:8000 Service( within 500ms within 500ms name="myservice-ensure-up", monitor=( HTTP( GET="http://localhost:8000/", freq=Time.ms(500), fail=[ Incident( errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py")])] )))).run() ffunction inc.
    • Example: Ensuring a service is runningfrom watchdog import *Monitor( Service( name="myservice-ensure-up", monitor=( HTTP( If we cant reach it during If we cant reach it during GET="http://localhost:8000/",seconds 5 5 seconds freq=Time.ms(500), fail=[ Incident( errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py")])] )))).run() ffunction inc.
    • Example: Ensuring a service is runningfrom watchdog import *Monitor( Service( name="myservice-ensure-up", monitor=( HTTP( GET="http://localhost:8000/", freq=Time.ms(500), fail=[ We kill and restart We kill and restart Incident( myservice-start.py myservice-start.py errors=5, during=Time.s(5), actions=[ Restart("myservice-start.py")])] )))).run() ffunction inc.
    • Example: Monitoring system healthfrom watchdog import *Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk", extract=lambdar,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system healthfrom watchdog import *Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk", extract=lambdar,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system health SystemInfo will retrieve SystemInfo will retrieve system information and system information andfrom watchdog import * return it as a dictionaryMonitor ( return it as a dictionary Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk", extract=lambdar,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system health We log each result by We log each result by extracting the givenfrom watchdog import * extracting the given value from the resultMonitor ( value from the result Service( dictionary (memoryUsage, name = "system-health", dictionary (memoryUsage, diskUsage,cpuUsage) monitor = ( diskUsage,cpuUsage) SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", extract=lambdar,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system healthfrom watchdog import *Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), Bandwidth collects success = ( Bandwidth collects network interface LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), network interface LogResult("myserver.system.disk=", extract=lambda live traffic information live traffic informationr,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda v:v["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system healthfrom watchdog import *Monitor ( Service( name = "system-health", monitor But we dont want the = ( But we dont want the SystemInfo(freq=Time.s(1), total amount, we just total amount, we just success = ( want the difference. wantLogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), the difference. LogResult("myserver.system.disk=", extract=lambda Delta does just that. Delta does just that.r,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system healthfrom watchdog import *Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", We print the result extract=lambdar,_:reduce(max,r["diskUsage"].values())), We print the result as before LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), as before ) ), Delta( Bandwidth("eth0", freq=Time.s(1)), extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent=")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system healthfrom watchdog import *Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", extract=lambda SystemHealth willr,_:reduce(max,r["diskUsage"].values())), SystemHealth will fail whenever the usage LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) fail whenever the usage ), is above the given is above the given Delta( thresholds thresholds Bandwidth("eth0", freq=Time.s(1)), extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, success = [LogResult("myserver.system.eth0.sent=")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Monitoring system healthfrom watchdog import *Monitor ( Service( name = "system-health", monitor = ( SystemInfo(freq=Time.s(1), success = ( LogResult("myserver.system.mem=", extract=lambda r,_:r["memoryUsage"]), LogResult("myserver.system.disk=", extract=lambdar,_:reduce(max,r["diskUsage"].values())), LogResult("myserver.system.cpu=", extract=lambda r,_:r["cpuUsage"]), ) ), Delta( Well log failures Bandwidth("eth0", freq=Time.s(1)), Well log failures extract = lambda _:_["total"]["bytes"]/1000.0/1000.0, file in a log in a log file success = [LogResult("myserver.system.eth0.sent=")] ), SystemHealth( cpu=0.90, disk=0.90, mem=0.90, freq=Time.s(60), fail=[Log(path="watchdog-system-failures.log")] ), ) )).run() ffunction inc.
    • Watchdog: OverviewMonitoring DSLDeclarative programming to define monitoringstrategyWide spectrumFrom data collection to incident detectionFlexibleDoes not impose a specific architecture ffunction inc.
    • Watchdog: Use casesEnsure service availabilityTest and stop/restart when problemsCollect system statisticsLog or send data through the networkAlert on system or service healthTake actions when the system stats is abovethreshold ffunction inc.
    • Get started ! On Github:http://github.com/sebastien/watchdog 1 Python file Documented API ffunction inc.
    • Merci ! www.ffctn.comsebastien@ffctn.comgithub.com/sebastien ffunction inc.