SlideShare a Scribd company logo
1 of 68
Download to read offline
RAFT
Python for System Administrator
Roberto Polli - roberto.polli@par-tec.it
Par-Tec Spa - Rome Operation Unit
P.zza S. Benedetto da Norcia, 33
00040, Pomezia (RM) - www.par-tec.it
March 13, 2016
Roberto Polli - roberto.polli@par-tec.it
RAFT
Agenda
Intro
ipython
Path management: 10’
Encoding: 10’
Data Gathering: 20’
module: psutil
module: subprocess
The /proc filesystem
Parsing: 60’
Regular Expressions
Nosetest Intermezzo: 15’
Processing: 45’
Distributions
Deviation
Correlation
Plotting Time
End
Roberto Polli - roberto.polli@par-tec.it
RAFT
Who? What? Why?
• Use python to replace Grep Awk Sed Perl. Speed up your daily job.
• Roberto Polli - Solutions Architect @ par-tec.it. Loves writing in C, Java
and Python. Red Hat Certified Engineer and Virtualization Administrator.
• Par-Tec – Proud sponsor of this talk ;) Contributes to various FLOSS and
provides expertise in IT Infrastructure & Services and Business Intelligence
solutions + Vertical Applications for the financial market.
Intro Roberto Polli - roberto.polli@par-tec.it
RAFT
Requirements
• python 2.7+, ipython
• course code from github
#git clone https://github.com/ioggstream/python-course
• test your environment (eg. psutil, numpy, scipy, matplotlib)
#nosetests -vs test prerequisites.py
• first part: nose, psutil
• second part: scipy, numpy, matplotlib
• ♦optional/advanced content ♦
Intro Roberto Polli - roberto.polli@par-tec.it
RAFT
How
• Get ready before starting: code is here on github!
• Use notebooks or type everything but #comments and try/except
• Type fast with tab-completion and copy-paste
• Be curious: inspect and print returned variables
• Never∗
close your iPython session: you’ll lose your precious variables
* (ok, sometimes you can).
Intro Roberto Polli - roberto.polli@par-tec.it
RAFT
References
• irc.freenode.net# python - The Python Community :D
• Python Cookbook 3rd ed. O’Reilly - David Beazley and Brian K. Jones
• Programming Python 4th ed. O’Reilly - Mark Lutz
• Dive into Python3 2nd ed. Apress - Mark Pilgrim
• nose.readthedocs.org
• github.com/ioggstream/python-course
Intro Roberto Polli - roberto.polli@par-tec.it
RAFT
iPython I
• Interactive interpreter with tons of functionalities, and the main tool of
our training.
• The most fun way to learn and use python!
• Supports tab-completion , readline , inline help
• Allows pasting from clipboard with %paste , and multi-line editing with
%edit
• Run it enabling plotting support:
# ipython --pylab
ipython Roberto Polli - roberto.polli@par-tec.it
RAFT
iPython II
# iPython supports inline-help appending ? to an object
str?
# We can run commands and capture the output in a variable
# don’t need to quote using the ! magic on unix
ret = !cat /etc/hosts
# windows has etchosts too ;)
ret = !type c: windowssystem32driversetchosts
ipython Roberto Polli - roberto.polli@par-tec.it
RAFT
iPython III
# returned objects can be filtered with
ret. grep (’localhost’)
# Now get the first space-splitted column of the output
ret. fields (0)
ret.grep(’localhost’).fields(0)
# And the last returned value is stored in
localip = _
# We can type long commands in an editor like ‘vi’ using
%edit mytmp.py # type print(ret[0]), then exit (eg. wq!)
> Editing... done. Executing edited code...
ipython Roberto Polli - roberto.polli@par-tec.it
RAFT
Path management: Goal
• Normalize paths on different platform
• Create, copy and remove folders
• Handle errors
modules: os, os.path, shutil, errno
see also: pathlib on Python 3.4+
Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Path management: os.path, sys
basedir, hosts = "/", "etc/hosts"
# Check the hosting platform with the sys module
from sys import platform
if platform.startswith(’win’):
basedir = ’c:/windows/system32/drivers’
# Always use the os.path module!
from os.path import join, normpath
hosts = join(basedir, hosts)
hosts = normpath(hosts)
print("Normalized path is", hosts)
Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Path management: os.path, sys
• os.path is the best way to manage paths!
• multiplatform
• safe
• join removes redundant ”/”
• normpath fixes ”/” orientation and redundant ”..”
• realpath resolves symlinks
And now, a rapid glance to other tools
Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Move trees: shutil, os, os.path
from os import makedirs # ...tree creation...
from os.path import isdir # ...checking...
from shutil import copytree, rmtree
makedirs("/tmp/py/foo/bar")
# We can copy a whole tree and test it
copytree("/tmp/py/foo", "/tmp/py/foo2")
assert isdir("/tmp/py/foo2/bar")
rmtree("/tmp/py/foo") # ... and finally delete it
assert not isdir("/tmp/py/foo/bar")
Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Move trees: errno
# We can use exception handlers to investigate errors
try:
# python2 does not allow to ignore existing directories...
makedirs ("/tmp/py/foo/bar")
# ...and raises an OSError
except OSError as e:
# Just use the errno module to check the error value
import errno
assert e.errno == errno.EEXIST
help(makedirs)
Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Encoding: Goal
• A string more than a sequence of bytes
• A string is a couple (bytes, encoding)
• Use unicode literals in python2
• Manage differently encoded filenames
• A string is not a sequence of bytes
modules: os, os.path, glob
Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Song of Childhood
Als das Kind Kind
war, ging es mit
h¨angenden Armen,
wollte der Bach sei ein
Fluß, der Flußsei ein
Strom, und diese
Pf¨utze das Meer.
Als das Kind Kind
war, wues nicht, daßes
Kind war, alles war
ihm beseelt, und alle
Seelen waren eins.
Als das Kind Kind
war, hatte es von
nichts eine Meinung,
hatte keine
Gewohnheit, saßoft im
Schneidersitz, lief aus
dem Stand, hatte
einen Wirbel im Haar
und machte kein
Gesicht beim
fotografieren.
“‘When the child was a child,
characters were bytes, and
strings list of bytes”’
Als das Kind Kind
war, fielen ihm die
Beeren wie nur
Beeren in die Hand
und jetzt immer noch,
machten ihm die
frischen Waln¨usse eine
rauhe Zunge und jetzt
immer noch, hatte es
auf jedem Berg die
Sehnsucht nach dem
immer h¨oheren Berg,
und in jeder Stadt die
Sehnsucht nach der
noch gr¨oStadt, und
das ist immer noch
so, griff im Wipfel
eines Baums nach
dem Kirschen in
einemHochgef¨uhl wie
auch heute noch, eine
Scheu vor jedem
Fremden und hat sie
immer noch, wartete
es auf den ersten
Schnee, und wartet so
immer noch.
Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Encoding is a map
# Py3 doesn’t need the u
the_string = u "Su00fcd" # S¨ud
# can be encoded in different
in_utf8 = the_string.encode(’utf-8’)
in_win = the_string.encode(’cp1252’)
type(in_utf8) == bytes # byte-sequences
# Decoding bytes using the wrong map..
# ...gives sad results ;)
in_utf8.decode(’cp1252’) # S ˜A1/4d
• Encoding is a one-to-one
map between a
typographical character
and a byte-sequence
• Decoding is its reverse
map
char ascii utf-8 cp1252
a [97] [97] [97]
¨u - [195, 188] [252]
Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Enters Encoding
# Filenames are binary data! Be careful when reading from
# a (eg. vfat) filesystem!
# To make python2 encoding-aware we should
from __future__ import unicode_literals
# Create 3 windows-encoded filenames in
basedir = "/tmp/py"
# using the provided function
from course import create_wuerstelstrasse
create_wuerstelstrasse(basedir)
Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Encoded filenames: glob
from glob import glob as ls # expands wildcards like a shell.
files = ls("/tmp/py/*.txt") # To avoid encoding issues ...
# UnicodeDecodeError : ’ascii’ codec can’t decode byte 0xFC
0xFC == 252 # remember the ¨u in cp1252 map?
files = ls( b "/tmp/py/*.txt") #..we explicitly use bytes
Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: Goal
Gathering System Data with multiplatform and platform-dependent tools.
• Get infos from files, /proc and /sys
• Capture command output
• Use psutil to get IO, CPU and memory data
• Parse files with a strategy
modules: psutil, subprocess, os
Data Gathering: 20’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: grep
def grep(needle, fpath):
"""is a minimal grep implementation
goal: open() is iterable and doesn’t
need splitlines()
goal: comprehension can filter iterables
"""
return [x for x in open(fpath) if needle in x]
# Do we have "localhost" in our "/etc/hosts"?
grep("localhost", "/etc/hosts")
Data Gathering: 20’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: psutil
# The psutil module is very nice!
import psutil
# Works on Windows, Linux and MacOS
psutil.cpu_percent()
# And its output is easy to manage
psutil.disk_io_counters()
Exercise: Which other information does psutil provide?
Data Gathering: 20’module: psutil Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: Exercises
Write a vmstat-like function printing every second:
• cpu usage % ;
• bytes read and written in the given interval;
• Hint: use psutil, time.sleep(1)
• Hint: try on ipython and then write the function using
%edit vmstat.py
Data Gathering: 20’module: psutil Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: subprocess
# The check_output function returns the command stdout
from subprocess import check_output
# It takes a list as an argument!
out = check_output("ping -w1 -c1 www.google.com". split ())
# and returns a string
print(out)
Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: security
# Be carefull with the above code
out = check_output(’ls "./may not work.doc"’. split ())
# You can use
from shlex import split
out = check_output( split (’ls "./will work.xlsx"’))
you = r"can ’even’ tokenize "respecting" quotedn chars"
from shlex import shlex
for token in shlex(you):
print(token)
Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: subprocess, sys
def sh(cmd, shell=False, timeout=0):
"""Returns an iterable output of a command string, checking ...
from sys import version_info as python version
from shlex import split
if python_version < (3, 3): # ..before using...
if timeout:
raise ValueError("Timeout not supported")
output = check_output(split(cmd), shell=shell)
else:
output = check_output(split(cmd), shell=shell, timeout=timeout)
return output. splitlines ()
Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: Exercises
Write a simple pgrep-like function for your OS which:
• ppgrep signature is the following
def ppgrep(program):
"""@param program - eg. firefox, explorer.exe"""
raise NotImplementedError
• prints a list of processes executing ‘program‘;
• Hint: use subprocess, os, and list-comprehension
items = [ x for x in a_list if ’firefox’ in x]
Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
RAFT
♦Data Gathering: Parsing /proc I ♦
def linux_threads(pid):
"""The Linux /proc filesystem is a cool place to get infos."""
from glob import glob # replaces * and ?
path = "/proc/{}/task/*/status".format(pid)
# Pick a set of fields to gather...
t_info = (’Pid’, ’Tgid’, ’voluntary’) # a tuple
for t_path in glob(path):
# ...and use comprehension to get interesting data.
print([x for x in open(t_path)
if x. startswith (t_info)] # accepts tuples!
)
Data Gathering: 20’The /proc filesystem Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: Parsing /proc II
# On Linux, /proc/diskstats is the source of I/O infos
disk_l = grep("sda", "/proc/diskstats")
# To gather that data we put the headers in a multi-line string
from course import diskstats_headers as headers
disk_info = disk_l[0].split() # Take the 1st entry, split the data
zip(headers, disk_info) # ...and tie them with the headers
list(_) # On py3 you need to iterate the generator!
Data Gathering: 20’The /proc filesystem Roberto Polli - roberto.polli@par-tec.it
RAFT
Data Gathering: Parsing /proc III
# Or create a reusable commodity class with
from collections import namedtuple
# using headers as attributes
# like the one provided by psutil
DiskStats = namedtuple(’DiskStat’, headers )
# ... and disk_info as values
dstat = DiskStats(*disk_info)
dstat.device, dstat.writes_ms
# Homework: check further features with
help(collections)
Data Gathering: 20’The /proc filesystem Roberto Polli - roberto.polli@par-tec.it
RAFT
Parsing: Goal
• Plan a parsing strategy
• Use basic regular expressions: match, search, sub
• Benchmarking a parser
• Running nosetests
• Write a simple parser
modules: re, nose, %timeit
Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Parsing is hard...
”System Administrators spent 24.3% of their work-life parsing
files.”∗
*Independent analysis by The GASP1
Society ;)
1
Grep Awk Sed Perl
Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
RAFT
...use a strategy!
1. Collect parsing samples
2. Play in ipython and collect %history
3. Write tests, then the parser
4. Eventually benchmark
Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Parsing postfix logs
# Before writing the parser, collect samples of
# the interesting lines. For now just
from course import mail_sent, mail_delivered
# and %edit a simple
def test_sent():
hour, host, to = parse_line(mail_sent)
assert hour == ’08:00:00’
assert to == ’jon@doe.it’
Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Parsing lines: split, zip
May 31 08:00:00 test-1 postfix/smtp[169]: 7CD8E730020: to= joe@foo.it , relay=mx2.foo.it[10.0.4.5]:25,
...
mail_sent.split() # Start using basic strings in ipython
# Then tie them with zip/zip()
fields, counting = _, zip(range(20), _)
fields = fields[:7] # We just care for the first 7 values
# and pick fields singularly
hour, host, dest = fields[2], fields[3], fields[6]
Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Parse: Exercise I
In another window
• edit 03 parsing test.py
• complete the parse line(line) function
def parse_line(line):
"""Write your function and test it
with test_sent()"""
raise NotImplementedError
%paste your solution’s code in iPython and run manually the test functions
Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Python Regexp
# Python supports regular expressions via
import re
# We start showing a grep-reloaded function
def grep(expr, fpath):
one = re.compile(expr) # ...has two lookup methods...
assert ( one.match # which searches from ˆ the beginning
and one. search ) # that searches anywhere
with open(fpath) as fp:
return [x for x in fp if one.search(x)]
Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
RAFT
Splitting with re.split
from re import split # is a very nice function
# Let’s gather some ping stats
if sys.platform.startswith(’win’):
cmd = "ping -n10 www.google.it"
else:
cmd = "ping -c10 -w10 www.google.it"
# Split for both space and =
ping_output = [ split("[ =]", x) for x in sh(cmd)]
Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
RAFT
Splitting with re.findall
from re import findall # can be misused too ;)
# eg. for adding the ":" to a
mac = "00""24""e8""b4""33""20"
# ...using this
re_hex = ’[0-9A-Fa-f]{2}’
mac_address = ’:’.join(findall(re_hex, mac))
print("The mac address is ", mac_address)
Actually this does a bit of validation, requiring all chars to be in the 0-F range
Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
RAFT
Benchmarking in iPython I
• Parsing big files needs benchmarks. iPython %timeit magic is a good
starting point.
test_regexps = ("..", "[a-fA-F0-9]{2}")
for re_s in test_regexps:
%timeit ’:’.join(findall (re_s, mac))
• We can even compare compiled and inline regexp
import re
for re_s in test_regexps:
re_c = re.compile (re_s)
%timeit ’:’.join(re_c.findall (mac))
Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
RAFT
Benchmarking in iPython II
Or find other methods:
• complex...
from re import sub as sed
%timeit sed(r’(..)’, r’1:’, mac)
• ...or simple
%timeit ’:’.join([ mac[i:i+2] for i in range(0,12,2)])
• Outside iPython check the timeit module
Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
RAFT
♦Parsing: a real world Example ♦
# Don’t need to type this VSAN configuration script
# which uses linux FC information from /sys filesystem
fc_id_path = "/sys/class/fc_host/host*/port_name"
for x in glob(fc_id_path):
# ...we boldly skip an explicit close()
pwwn = open(x).read() # 0x500143802427e66c
pwwn = pwwn[2:]
# ...and even use the slower but readable
pwwn = re.findall(r’..’, pwwn)
print("member pwwn ", ’:’.join(pwwn))
Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
RAFT
Parsing logs: a simple solution
def parse_line(line):
import re
# using _ we improve readability
_, _, hour, host, _, _, dest = line.split()[:7]
try:
# and if dest isn’t what we expect...
dest = re.split(r’[<>]’,dest)[1]
except IndexError:
# ...we set it to None
dest = None
return (hour, host, dest)
Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
RAFT
Parsing logs: II
# Now another test for the delivered messages
# %edit 03_parsing_test
def test_delivered():
hour, host, destination = parse_line(test_str_2)
assert hour == ’08:00:00’
# Delivery logs should have destination == None
assert destination is None
# Exercise: fix parse_line to work with both tests
# and save test
Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Running nosetest
• Now run the following command from a shell
# nosetests -vs 03_parsing_test.py
03_parsing_test.test_sent ... ok
03_parsing_test.test_delivered ... ok
Ran 2 tests in 0.001s
• Nose is a test framework.
• Nose runs every file matching test *
• Nose runs every function matching test *
Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Simple Test Script
• Open the 02 nosetests simple.py file
def setup():
print("is run before the testsuite, while")
def teardown():
print("after all tests")
def test_one():
# name a function like test_* to run it!
assert 1 == 1
def test_two():
# and use assert to test for success
assert 1 == 0, "I was expecting 0"
Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
RAFT
♦Complete Test Script: I ♦
• A more flexible script is 02 nosetests full.py which uses a Test class
class Test(object):
@classmethod
def setup_class(self): # is run once at startup,
# ..eg. to create database structure
print("setup testsuite environment")
open("/tmp/test2.out", "w").write("0")
@classmethod
def teardown_class(self): # is run once after all tests to...
print("cleanup testsuite environment")
os.unlink("/tmp/test2.out")
Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
RAFT
♦Complete Test Script: II ♦
• allowing pre-post testsuite and pre-post test fixtures
class Test(object):
...
# Using a Test class...
def setup(self):
print("is_run_before_every_test") #..and..
def teardown(self):
print("after_every_test") # eg truncate a table
# each test can use the prepared environment
def test_a(self):
assert os.path.isfile("/tmp/test2.out")
Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Simple processing: Goal
• Handle gathered data with dict() and zip()
• Find data relation with scipy
• Get essential information like standard deviation σ and distributions δ
• Linear correlation: what’s that, when can help
• Plotting
modules: numpy, scipy, scipy.stats.stats, collections, random, time
Processing: 45’ Roberto Polli - roberto.polli@par-tec.it
RAFT
The Chicken Paradox
“‘According to latest statistics,
it appears that you eat one chicken per year:
and, if that doesn’t fit your budget,
you’ll fit into statistic anyway,
because someone will eat two.”’ C. A. Salustri
Processing: 45’ Roberto Polli - roberto.polli@par-tec.it
RAFT
Simple processing: Exercise
How to dismantle the chicken paradox? Gather data!
• Write the following function using our parsing strategy
def ping_rtt(seconds=10):
"""@return: a list of ping RTT"""
from course import sh
# get sample output
# find a solution in ipython
# test and paste the code
raise NotImplementedError
• Gather 10 seconds of ping output
• Hint: reuse the sh() function
• Hint: slice and filter lists using comprehension
Processing: 45’Distributions Roberto Polli - roberto.polli@par-tec.it
RAFT
Distributions: set, defaultdict
A distribution or δ shows the frequency of events, like how many people ate x
chickens ;)
#Create a simple δ with Counter
from collection import Counter
d = Counter(rtt)
# We can even use a more flexible
from collections import defaultdict
d = defaultdict(int)
for x in rtt:
distro[x] += 1
Distributions and Mean are both important!
Processing: 45’Distributions Roberto Polli - roberto.polli@par-tec.it
RAFT
Standard Deviation: scipy
• Standard deviation or σ
formula is
σ2
(X) := (x−¯x)2
n
• σ tells if δ is fair or not,
and how much the mean
(¯x) is representative
• matplotlib.mlab.normpdf
is a smooth function
approximating the
histogram
from scipy import std, mean
fair = [1, 1] # chickens
unfair = [0, 2] # chickens
assert mean(fair) == mean(unfair)
# Use standard deviation!
std(fair) # 0
std(unfair) # 1
Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
RAFT
Simple processing: scipy
Check your computed values vs the σ returned by ping (didn’t you notice ping
returned it?)
"""goal: remember to convert to numeric / float
goal: use scipy
goal: check stdev"""
from scipy import std, mean # max,min are builtin
rtt = ping_rtt()
print(max(rtt), min(rtt), mean(rtt), std(rtt))
Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
RAFT
Time Distributions: Exercise
• Parse the provided maillog in ipython using its ! magic and get an hourly
email δ
• Expected output:
time_d = { # mail delivered (removed) between
0: xxx # 00:00 - 00:59
1: xxx # 01:00 - 01:59
..
}
Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
RAFT
Time Distributions: Exercise Solution
# deliveder emails are like the following
#May 14 16:00:04 rpolli postfix/qmgr[122]: 4DC3DA: removed"
ret = !grep removed maillog # get the interesting lines
ts = ret.fields(2) # find the timestamp (3rd column)
hours = [ int(ts) for x in ts ]
time_d = {x: count(x) for x in set(hours)}
Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
RAFT
Plotting distributions
# To plot data..
from matplotlib import pyplot as plt
# and set the interactive mode
plt.ion()
# Plotting an histogram...
frequency, bins, _ = hist(hours)
# .. returns a
distribution = dict(zip(slots,
frequency))
This server works mostly at
night...
Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
RAFT
Size Distributions: Exercise
• Create a size δ using hist(..., bins=...)
• Hint: help(hist)
size_d = { # mail size between
0: xxx # 0 - 10k
1: xxx # 10k - 20k
..
}
• Homework: Use the size δ to find size mean and size sigma and compare
with σ and mean evaluated from the original data-series
Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
RAFT
♦Simulating data with σ and ¯x ♦
Mean and a stdev are useful starting point to simulate data using the gaussian
distribution.
# A mail load generator creating attachments of a given size...
from random import gauss
mail_size = gauss(mean, sigma_s) # a random number
# and use time_d to simulate the load during the day
from time import localtime
hour = localtime().tm_hour
mail_per_minute = time_d[hour] / 60 # minutes in hour
Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
RAFT
Linear Correlation
# Let’s plot the following datasets
# taken from a 4-hour distribution
mail_sent = [1, 5, 500, 250, 100, 7]
kB_s = [70, 300, 29000, 12500, 450, 500]
# A scatter plot can suggest relations
# between data
plt.scatter(mail_sent, kB_s)
Correlating Mail and Thruput
100 0 100 200 300 400 500 600
kMailsent
5000
0
5000
10000
15000
20000
25000
30000
35000
ThruputkB/s
Correlatingmailandthruput
Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
RAFT
Linear Correlation
The Pearson Coefficient ρ is a relation indicator.
0 no relation
1 direct relation (both dataset increase together)
-1 inverse relation (one increase as the other decrease)
ρ(X, Y ) =
(x − ¯x)(y − ¯y)
(x − ¯x)2 (y − ¯y)2
(1)
from scipy.stats.stats import pearsonr
ret = pearsonr(mail_sent, kB_s)
print(ret)
>(0.9823, 0.0004)
correlation, probability = ret
Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
RAFT
You must (scatter) plot!
ρ does not detect non-linear correlation
Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
RAFT
Combinations
# Given a table with many data series
from course import table
table = {...
’cpu_usr’: [10, 23, 55, ..],
’byte_in’: [2132, 3212, 3942, ..], }
# We can combine all their names with
from itertools import combinations
list(combinations(table,2))
>[(’swap_in’, ’cpu_sys’),
(’swap_in’, ’csw’), (’cpu_sys’, ’csw’)... ]
Combinating 4 suites,
2 at a time.
♥♠
♥♣
♥♦
♠♣
♠♦
♣♦
Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
RAFT
Netfishing correlation
We can try every combination between data series and check if there’s some
ρ.
for k1, k2 in combinations(table, 2):
corr, probability = pearsonr(table[k1], table[k2])
if corr < 0.5:
# I’m *still* not interested in data under this threshold
continue
print("linear correlation between {} and {} is {}".format(
k1, k2, corr))
Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
RAFT
Correlating I/O and Context Switch
Now we’ll generate some correlation plots from table data, like this one.
Processing: 45’Plotting Time Roberto Polli - roberto.polli@par-tec.it
RAFT
Netfishing correlation II
# create all combined plot
for k1, k2 in combinations(table, 2):
corr, probability = pearsonr(table[k1], table[k2])
plt.scatter(table[k1], table[k2])
# 3 digit precision on title
plt.title("R={:0.3f}".format(corr))
plt.xlabel(k1); plt.ylabel(k2)
# save and close the plot
plt.savefig("{}_{}.png".format(k1, k2)); plt.close()
Processing: 45’Plotting Time Roberto Polli - roberto.polli@par-tec.it
RAFT
Mark time with colors
# Get combined data directly via items
# using 3 buckets
buckets = 3
for (k1, v1), (k2, v2) in combinations(table. items (), 2):
corr, probability = pearsonr(v1, v2)
length = len(v1)
# Get an array of colors
# eg. [0, 0, ..., 1, 1, .., 2, 2, ...]
colors = [(i * buckets / l) for i in xrange(l) ]
# iterate colors with a nice colorbar
plt.scatter(t1, t2, color=colors)
Processing: 45’Plotting Time Roberto Polli - roberto.polli@par-tec.it
RAFT
That’s all folks!
Thank you for the attention!
Roberto Polli - roberto.polli@par-tec.it
End Roberto Polli - roberto.polli@par-tec.it

More Related Content

What's hot

Package Management via Spack on SJTU π Supercomputer
Package Management via Spack on SJTU π SupercomputerPackage Management via Spack on SJTU π Supercomputer
Package Management via Spack on SJTU π SupercomputerJianwen Wei
 
AusNOG 2019 - Getting IPv6 Private Addressing Right
AusNOG 2019 - Getting IPv6 Private Addressing RightAusNOG 2019 - Getting IPv6 Private Addressing Right
AusNOG 2019 - Getting IPv6 Private Addressing RightMark Smith
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Evel xf
 
Software Packaging with RPM
Software Packaging with RPMSoftware Packaging with RPM
Software Packaging with RPMSchalk Cronjé
 
Building RT image with Yocto
Building RT image with YoctoBuilding RT image with Yocto
Building RT image with YoctoAlexandre LAHAYE
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALIDNur Ahammad
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 
Configure, Pack and Distribute: An RPM Creation Workshop
Configure, Pack and Distribute: An RPM Creation WorkshopConfigure, Pack and Distribute: An RPM Creation Workshop
Configure, Pack and Distribute: An RPM Creation WorkshopNovell
 
openPOWERLINK over Xenomai
openPOWERLINK over XenomaiopenPOWERLINK over Xenomai
openPOWERLINK over XenomaiAlexandre LAHAYE
 
Iptablesrocks
IptablesrocksIptablesrocks
Iptablesrocksqwer_asdf
 
Cisco CCNA OSPF IPV6 Configuration
Cisco CCNA OSPF IPV6 ConfigurationCisco CCNA OSPF IPV6 Configuration
Cisco CCNA OSPF IPV6 ConfigurationHamed Moghaddam
 
Cisco CCNA IPV6 Static Configuration
Cisco CCNA  IPV6 Static ConfigurationCisco CCNA  IPV6 Static Configuration
Cisco CCNA IPV6 Static ConfigurationHamed Moghaddam
 
Hands on Experience with IPv6 Routing and Switching Services
Hands on Experience with IPv6 Routing and Switching ServicesHands on Experience with IPv6 Routing and Switching Services
Hands on Experience with IPv6 Routing and Switching ServicesCisco Canada
 
Aynchronous Processing in Kamailio Configuration File
Aynchronous Processing in Kamailio Configuration FileAynchronous Processing in Kamailio Configuration File
Aynchronous Processing in Kamailio Configuration FileDaniel-Constantin Mierla
 
IPv6 for Pentesters
IPv6 for PentestersIPv6 for Pentesters
IPv6 for Pentesterscamsec
 

What's hot (20)

Package Management via Spack on SJTU π Supercomputer
Package Management via Spack on SJTU π SupercomputerPackage Management via Spack on SJTU π Supercomputer
Package Management via Spack on SJTU π Supercomputer
 
Using Netconf/Yang with OpenDalight
Using Netconf/Yang with OpenDalightUsing Netconf/Yang with OpenDalight
Using Netconf/Yang with OpenDalight
 
Generator Tricks for Systems Programmers
Generator Tricks for Systems ProgrammersGenerator Tricks for Systems Programmers
Generator Tricks for Systems Programmers
 
AusNOG 2019 - Getting IPv6 Private Addressing Right
AusNOG 2019 - Getting IPv6 Private Addressing RightAusNOG 2019 - Getting IPv6 Private Addressing Right
AusNOG 2019 - Getting IPv6 Private Addressing Right
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Eve
 
Lab manual
Lab manualLab manual
Lab manual
 
Software Packaging with RPM
Software Packaging with RPMSoftware Packaging with RPM
Software Packaging with RPM
 
Building RT image with Yocto
Building RT image with YoctoBuilding RT image with Yocto
Building RT image with Yocto
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALID
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Configure, Pack and Distribute: An RPM Creation Workshop
Configure, Pack and Distribute: An RPM Creation WorkshopConfigure, Pack and Distribute: An RPM Creation Workshop
Configure, Pack and Distribute: An RPM Creation Workshop
 
Understanding iptables
Understanding iptablesUnderstanding iptables
Understanding iptables
 
openPOWERLINK over Xenomai
openPOWERLINK over XenomaiopenPOWERLINK over Xenomai
openPOWERLINK over Xenomai
 
Iptablesrocks
IptablesrocksIptablesrocks
Iptablesrocks
 
Cisco CCNA OSPF IPV6 Configuration
Cisco CCNA OSPF IPV6 ConfigurationCisco CCNA OSPF IPV6 Configuration
Cisco CCNA OSPF IPV6 Configuration
 
Cisco CCNA IPV6 Static Configuration
Cisco CCNA  IPV6 Static ConfigurationCisco CCNA  IPV6 Static Configuration
Cisco CCNA IPV6 Static Configuration
 
Hands on Experience with IPv6 Routing and Switching Services
Hands on Experience with IPv6 Routing and Switching ServicesHands on Experience with IPv6 Routing and Switching Services
Hands on Experience with IPv6 Routing and Switching Services
 
Aynchronous Processing in Kamailio Configuration File
Aynchronous Processing in Kamailio Configuration FileAynchronous Processing in Kamailio Configuration File
Aynchronous Processing in Kamailio Configuration File
 
Hands-on ethernet driver
Hands-on ethernet driverHands-on ethernet driver
Hands-on ethernet driver
 
IPv6 for Pentesters
IPv6 for PentestersIPv6 for Pentesters
IPv6 for Pentesters
 

Viewers also liked

Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...pythoncharmers
 
Lecture 3. farming methods
Lecture 3. farming methodsLecture 3. farming methods
Lecture 3. farming methodsMandeep Kaur
 
Lecture 2. aquaculture systems methods_and_types - copy
Lecture 2. aquaculture systems methods_and_types - copyLecture 2. aquaculture systems methods_and_types - copy
Lecture 2. aquaculture systems methods_and_types - copyMandeep Kaur
 
Python入門 : 4日間コース社内トレーニング
Python入門 : 4日間コース社内トレーニングPython入門 : 4日間コース社内トレーニング
Python入門 : 4日間コース社内トレーニングYuichi Ito
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesMatt Harrison
 

Viewers also liked (7)

Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
 
Pipe rack & rack piping
Pipe rack & rack pipingPipe rack & rack piping
Pipe rack & rack piping
 
Lecture 3. farming methods
Lecture 3. farming methodsLecture 3. farming methods
Lecture 3. farming methods
 
Lecture 2. aquaculture systems methods_and_types - copy
Lecture 2. aquaculture systems methods_and_types - copyLecture 2. aquaculture systems methods_and_types - copy
Lecture 2. aquaculture systems methods_and_types - copy
 
Aquaculture
AquacultureAquaculture
Aquaculture
 
Python入門 : 4日間コース社内トレーニング
Python入門 : 4日間コース社内トレーニングPython入門 : 4日間コース社内トレーニング
Python入門 : 4日間コース社内トレーニング
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 

Similar to Here are a few key points about securely using subprocess:- Always pass commands as a list, not a string, to avoid shell injection vulnerabilities. The shlex module can help safely split strings into lists.- Be careful with user-provided inputs. Sanitize, validate, escape as needed before passing to subprocess. - Set the shell argument to False to avoid invoking the shell. This prevents things like pipes, redirects from working but is more secure.- Check return codes from processes and handle errors/exceptions appropriately. - Limit privileges when possible by dropping permissions before calling external programs.- Isolate processes by running them in separate environments like Docker containers or virtual machines.- Use OS

Will iPython replace Bash?
Will iPython replace Bash?Will iPython replace Bash?
Will iPython replace Bash?Babel
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?Roberto Polli
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
Introduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIOIntroduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIOKris Findlay
 
Raspberry Pi + ROS
Raspberry Pi + ROSRaspberry Pi + ROS
Raspberry Pi + ROSArnoldBail
 
05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR mattersAlexandre Moneger
 
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilPôle Systematic Paris-Region
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Romain Dorgueil
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopPyCon Italia
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)wesley chun
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)wesley chun
 
10 more-things-you-can-do-with-python
10 more-things-you-can-do-with-python10 more-things-you-can-do-with-python
10 more-things-you-can-do-with-pythonDaniel Greenfeld
 
Non-Blocking Strategies for FFI
 Non-Blocking Strategies for FFI Non-Blocking Strategies for FFI
Non-Blocking Strategies for FFIESUG
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetupsource{d}
 
PuppetConf 2014 Killer R10K Workflow With Notes
PuppetConf 2014 Killer R10K Workflow With NotesPuppetConf 2014 Killer R10K Workflow With Notes
PuppetConf 2014 Killer R10K Workflow With NotesPhil Zimmerman
 
AI Machine Learning Complete Course: for PHP & Python Devs
AI Machine Learning Complete Course: for PHP & Python DevsAI Machine Learning Complete Course: for PHP & Python Devs
AI Machine Learning Complete Course: for PHP & Python DevsAmr Shawqy
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientistsaspyker
 

Similar to Here are a few key points about securely using subprocess:- Always pass commands as a list, not a string, to avoid shell injection vulnerabilities. The shlex module can help safely split strings into lists.- Be careful with user-provided inputs. Sanitize, validate, escape as needed before passing to subprocess. - Set the shell argument to False to avoid invoking the shell. This prevents things like pipes, redirects from working but is more secure.- Check return codes from processes and handle errors/exceptions appropriately. - Limit privileges when possible by dropping permissions before calling external programs.- Isolate processes by running them in separate environments like Docker containers or virtual machines.- Use OS (20)

Will iPython replace Bash?
Will iPython replace Bash?Will iPython replace Bash?
Will iPython replace Bash?
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Introduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIOIntroduction to Raspberry Pi and GPIO
Introduction to Raspberry Pi and GPIO
 
Raspberry Pi + ROS
Raspberry Pi + ROSRaspberry Pi + ROS
Raspberry Pi + ROS
 
05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters
 
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
 
Monitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntopMonitoraggio del Traffico di Rete Usando Python ed ntop
Monitoraggio del Traffico di Rete Usando Python ed ntop
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)What is Python? (Silicon Valley CodeCamp 2015)
What is Python? (Silicon Valley CodeCamp 2015)
 
10 more-things-you-can-do-with-python
10 more-things-you-can-do-with-python10 more-things-you-can-do-with-python
10 more-things-you-can-do-with-python
 
Non-Blocking Strategies for FFI
 Non-Blocking Strategies for FFI Non-Blocking Strategies for FFI
Non-Blocking Strategies for FFI
 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
 
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetupMachine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
 
PuppetConf 2014 Killer R10K Workflow With Notes
PuppetConf 2014 Killer R10K Workflow With NotesPuppetConf 2014 Killer R10K Workflow With Notes
PuppetConf 2014 Killer R10K Workflow With Notes
 
AI Machine Learning Complete Course: for PHP & Python Devs
AI Machine Learning Complete Course: for PHP & Python DevsAI Machine Learning Complete Course: for PHP & Python Devs
AI Machine Learning Complete Course: for PHP & Python Devs
 
Python build your security tools.pdf
Python build your security tools.pdfPython build your security tools.pdf
Python build your security tools.pdf
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
 

More from Roberto Polli

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTPRoberto Polli
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Roberto Polli
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggeraRoberto Polli
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstackRoberto Polli
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestRoberto Polli
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.Roberto Polli
 
Statistics 101 for System Administrators
Statistics 101 for System AdministratorsStatistics 101 for System Administrators
Statistics 101 for System AdministratorsRoberto Polli
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyRoberto Polli
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repositoryRoberto Polli
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embeddedRoberto Polli
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceRoberto Polli
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009Roberto Polli
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1Roberto Polli
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Roberto Polli
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPRoberto Polli
 
ds risparmio energetico
ds risparmio energeticods risparmio energetico
ds risparmio energeticoRoberto Polli
 
Aggregatori di notizie
Aggregatori di notizieAggregatori di notizie
Aggregatori di notizieRoberto Polli
 

More from Roberto Polli (20)

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTP
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggera
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetest
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.
 
Statistics 101 for System Administrators
Statistics 101 for System AdministratorsStatistics 101 for System Administrators
Statistics 101 for System Administrators
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are Easy
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repository
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embedded
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open source
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAP
 
ultimo-miglio-v3
ultimo-miglio-v3ultimo-miglio-v3
ultimo-miglio-v3
 
Ultimo Miglio v2
Ultimo Miglio v2Ultimo Miglio v2
Ultimo Miglio v2
 
Ultimo Miglio
Ultimo MiglioUltimo Miglio
Ultimo Miglio
 
ds risparmio energetico
ds risparmio energeticods risparmio energetico
ds risparmio energetico
 
Aggregatori di notizie
Aggregatori di notizieAggregatori di notizie
Aggregatori di notizie
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Here are a few key points about securely using subprocess:- Always pass commands as a list, not a string, to avoid shell injection vulnerabilities. The shlex module can help safely split strings into lists.- Be careful with user-provided inputs. Sanitize, validate, escape as needed before passing to subprocess. - Set the shell argument to False to avoid invoking the shell. This prevents things like pipes, redirects from working but is more secure.- Check return codes from processes and handle errors/exceptions appropriately. - Limit privileges when possible by dropping permissions before calling external programs.- Isolate processes by running them in separate environments like Docker containers or virtual machines.- Use OS

  • 1. RAFT Python for System Administrator Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa - Rome Operation Unit P.zza S. Benedetto da Norcia, 33 00040, Pomezia (RM) - www.par-tec.it March 13, 2016 Roberto Polli - roberto.polli@par-tec.it
  • 2. RAFT Agenda Intro ipython Path management: 10’ Encoding: 10’ Data Gathering: 20’ module: psutil module: subprocess The /proc filesystem Parsing: 60’ Regular Expressions Nosetest Intermezzo: 15’ Processing: 45’ Distributions Deviation Correlation Plotting Time End Roberto Polli - roberto.polli@par-tec.it
  • 3. RAFT Who? What? Why? • Use python to replace Grep Awk Sed Perl. Speed up your daily job. • Roberto Polli - Solutions Architect @ par-tec.it. Loves writing in C, Java and Python. Red Hat Certified Engineer and Virtualization Administrator. • Par-Tec – Proud sponsor of this talk ;) Contributes to various FLOSS and provides expertise in IT Infrastructure & Services and Business Intelligence solutions + Vertical Applications for the financial market. Intro Roberto Polli - roberto.polli@par-tec.it
  • 4. RAFT Requirements • python 2.7+, ipython • course code from github #git clone https://github.com/ioggstream/python-course • test your environment (eg. psutil, numpy, scipy, matplotlib) #nosetests -vs test prerequisites.py • first part: nose, psutil • second part: scipy, numpy, matplotlib • ♦optional/advanced content ♦ Intro Roberto Polli - roberto.polli@par-tec.it
  • 5. RAFT How • Get ready before starting: code is here on github! • Use notebooks or type everything but #comments and try/except • Type fast with tab-completion and copy-paste • Be curious: inspect and print returned variables • Never∗ close your iPython session: you’ll lose your precious variables * (ok, sometimes you can). Intro Roberto Polli - roberto.polli@par-tec.it
  • 6. RAFT References • irc.freenode.net# python - The Python Community :D • Python Cookbook 3rd ed. O’Reilly - David Beazley and Brian K. Jones • Programming Python 4th ed. O’Reilly - Mark Lutz • Dive into Python3 2nd ed. Apress - Mark Pilgrim • nose.readthedocs.org • github.com/ioggstream/python-course Intro Roberto Polli - roberto.polli@par-tec.it
  • 7. RAFT iPython I • Interactive interpreter with tons of functionalities, and the main tool of our training. • The most fun way to learn and use python! • Supports tab-completion , readline , inline help • Allows pasting from clipboard with %paste , and multi-line editing with %edit • Run it enabling plotting support: # ipython --pylab ipython Roberto Polli - roberto.polli@par-tec.it
  • 8. RAFT iPython II # iPython supports inline-help appending ? to an object str? # We can run commands and capture the output in a variable # don’t need to quote using the ! magic on unix ret = !cat /etc/hosts # windows has etchosts too ;) ret = !type c: windowssystem32driversetchosts ipython Roberto Polli - roberto.polli@par-tec.it
  • 9. RAFT iPython III # returned objects can be filtered with ret. grep (’localhost’) # Now get the first space-splitted column of the output ret. fields (0) ret.grep(’localhost’).fields(0) # And the last returned value is stored in localip = _ # We can type long commands in an editor like ‘vi’ using %edit mytmp.py # type print(ret[0]), then exit (eg. wq!) > Editing... done. Executing edited code... ipython Roberto Polli - roberto.polli@par-tec.it
  • 10. RAFT Path management: Goal • Normalize paths on different platform • Create, copy and remove folders • Handle errors modules: os, os.path, shutil, errno see also: pathlib on Python 3.4+ Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 11. RAFT Path management: os.path, sys basedir, hosts = "/", "etc/hosts" # Check the hosting platform with the sys module from sys import platform if platform.startswith(’win’): basedir = ’c:/windows/system32/drivers’ # Always use the os.path module! from os.path import join, normpath hosts = join(basedir, hosts) hosts = normpath(hosts) print("Normalized path is", hosts) Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 12. RAFT Path management: os.path, sys • os.path is the best way to manage paths! • multiplatform • safe • join removes redundant ”/” • normpath fixes ”/” orientation and redundant ”..” • realpath resolves symlinks And now, a rapid glance to other tools Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 13. RAFT Move trees: shutil, os, os.path from os import makedirs # ...tree creation... from os.path import isdir # ...checking... from shutil import copytree, rmtree makedirs("/tmp/py/foo/bar") # We can copy a whole tree and test it copytree("/tmp/py/foo", "/tmp/py/foo2") assert isdir("/tmp/py/foo2/bar") rmtree("/tmp/py/foo") # ... and finally delete it assert not isdir("/tmp/py/foo/bar") Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 14. RAFT Move trees: errno # We can use exception handlers to investigate errors try: # python2 does not allow to ignore existing directories... makedirs ("/tmp/py/foo/bar") # ...and raises an OSError except OSError as e: # Just use the errno module to check the error value import errno assert e.errno == errno.EEXIST help(makedirs) Path management: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 15. RAFT Encoding: Goal • A string more than a sequence of bytes • A string is a couple (bytes, encoding) • Use unicode literals in python2 • Manage differently encoded filenames • A string is not a sequence of bytes modules: os, os.path, glob Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 16. RAFT Song of Childhood Als das Kind Kind war, ging es mit h¨angenden Armen, wollte der Bach sei ein Fluß, der Flußsei ein Strom, und diese Pf¨utze das Meer. Als das Kind Kind war, wues nicht, daßes Kind war, alles war ihm beseelt, und alle Seelen waren eins. Als das Kind Kind war, hatte es von nichts eine Meinung, hatte keine Gewohnheit, saßoft im Schneidersitz, lief aus dem Stand, hatte einen Wirbel im Haar und machte kein Gesicht beim fotografieren. “‘When the child was a child, characters were bytes, and strings list of bytes”’ Als das Kind Kind war, fielen ihm die Beeren wie nur Beeren in die Hand und jetzt immer noch, machten ihm die frischen Waln¨usse eine rauhe Zunge und jetzt immer noch, hatte es auf jedem Berg die Sehnsucht nach dem immer h¨oheren Berg, und in jeder Stadt die Sehnsucht nach der noch gr¨oStadt, und das ist immer noch so, griff im Wipfel eines Baums nach dem Kirschen in einemHochgef¨uhl wie auch heute noch, eine Scheu vor jedem Fremden und hat sie immer noch, wartete es auf den ersten Schnee, und wartet so immer noch. Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 17. RAFT Encoding is a map # Py3 doesn’t need the u the_string = u "Su00fcd" # S¨ud # can be encoded in different in_utf8 = the_string.encode(’utf-8’) in_win = the_string.encode(’cp1252’) type(in_utf8) == bytes # byte-sequences # Decoding bytes using the wrong map.. # ...gives sad results ;) in_utf8.decode(’cp1252’) # S ˜A1/4d • Encoding is a one-to-one map between a typographical character and a byte-sequence • Decoding is its reverse map char ascii utf-8 cp1252 a [97] [97] [97] ¨u - [195, 188] [252] Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 18. RAFT Enters Encoding # Filenames are binary data! Be careful when reading from # a (eg. vfat) filesystem! # To make python2 encoding-aware we should from __future__ import unicode_literals # Create 3 windows-encoded filenames in basedir = "/tmp/py" # using the provided function from course import create_wuerstelstrasse create_wuerstelstrasse(basedir) Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 19. RAFT Encoded filenames: glob from glob import glob as ls # expands wildcards like a shell. files = ls("/tmp/py/*.txt") # To avoid encoding issues ... # UnicodeDecodeError : ’ascii’ codec can’t decode byte 0xFC 0xFC == 252 # remember the ¨u in cp1252 map? files = ls( b "/tmp/py/*.txt") #..we explicitly use bytes Encoding: 10’ Roberto Polli - roberto.polli@par-tec.it
  • 20. RAFT Data Gathering: Goal Gathering System Data with multiplatform and platform-dependent tools. • Get infos from files, /proc and /sys • Capture command output • Use psutil to get IO, CPU and memory data • Parse files with a strategy modules: psutil, subprocess, os Data Gathering: 20’ Roberto Polli - roberto.polli@par-tec.it
  • 21. RAFT Data Gathering: grep def grep(needle, fpath): """is a minimal grep implementation goal: open() is iterable and doesn’t need splitlines() goal: comprehension can filter iterables """ return [x for x in open(fpath) if needle in x] # Do we have "localhost" in our "/etc/hosts"? grep("localhost", "/etc/hosts") Data Gathering: 20’ Roberto Polli - roberto.polli@par-tec.it
  • 22. RAFT Data Gathering: psutil # The psutil module is very nice! import psutil # Works on Windows, Linux and MacOS psutil.cpu_percent() # And its output is easy to manage psutil.disk_io_counters() Exercise: Which other information does psutil provide? Data Gathering: 20’module: psutil Roberto Polli - roberto.polli@par-tec.it
  • 23. RAFT Data Gathering: Exercises Write a vmstat-like function printing every second: • cpu usage % ; • bytes read and written in the given interval; • Hint: use psutil, time.sleep(1) • Hint: try on ipython and then write the function using %edit vmstat.py Data Gathering: 20’module: psutil Roberto Polli - roberto.polli@par-tec.it
  • 24. RAFT Data Gathering: subprocess # The check_output function returns the command stdout from subprocess import check_output # It takes a list as an argument! out = check_output("ping -w1 -c1 www.google.com". split ()) # and returns a string print(out) Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
  • 25. RAFT Data Gathering: security # Be carefull with the above code out = check_output(’ls "./may not work.doc"’. split ()) # You can use from shlex import split out = check_output( split (’ls "./will work.xlsx"’)) you = r"can ’even’ tokenize "respecting" quotedn chars" from shlex import shlex for token in shlex(you): print(token) Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
  • 26. RAFT Data Gathering: subprocess, sys def sh(cmd, shell=False, timeout=0): """Returns an iterable output of a command string, checking ... from sys import version_info as python version from shlex import split if python_version < (3, 3): # ..before using... if timeout: raise ValueError("Timeout not supported") output = check_output(split(cmd), shell=shell) else: output = check_output(split(cmd), shell=shell, timeout=timeout) return output. splitlines () Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
  • 27. RAFT Data Gathering: Exercises Write a simple pgrep-like function for your OS which: • ppgrep signature is the following def ppgrep(program): """@param program - eg. firefox, explorer.exe""" raise NotImplementedError • prints a list of processes executing ‘program‘; • Hint: use subprocess, os, and list-comprehension items = [ x for x in a_list if ’firefox’ in x] Data Gathering: 20’module: subprocess Roberto Polli - roberto.polli@par-tec.it
  • 28. RAFT ♦Data Gathering: Parsing /proc I ♦ def linux_threads(pid): """The Linux /proc filesystem is a cool place to get infos.""" from glob import glob # replaces * and ? path = "/proc/{}/task/*/status".format(pid) # Pick a set of fields to gather... t_info = (’Pid’, ’Tgid’, ’voluntary’) # a tuple for t_path in glob(path): # ...and use comprehension to get interesting data. print([x for x in open(t_path) if x. startswith (t_info)] # accepts tuples! ) Data Gathering: 20’The /proc filesystem Roberto Polli - roberto.polli@par-tec.it
  • 29. RAFT Data Gathering: Parsing /proc II # On Linux, /proc/diskstats is the source of I/O infos disk_l = grep("sda", "/proc/diskstats") # To gather that data we put the headers in a multi-line string from course import diskstats_headers as headers disk_info = disk_l[0].split() # Take the 1st entry, split the data zip(headers, disk_info) # ...and tie them with the headers list(_) # On py3 you need to iterate the generator! Data Gathering: 20’The /proc filesystem Roberto Polli - roberto.polli@par-tec.it
  • 30. RAFT Data Gathering: Parsing /proc III # Or create a reusable commodity class with from collections import namedtuple # using headers as attributes # like the one provided by psutil DiskStats = namedtuple(’DiskStat’, headers ) # ... and disk_info as values dstat = DiskStats(*disk_info) dstat.device, dstat.writes_ms # Homework: check further features with help(collections) Data Gathering: 20’The /proc filesystem Roberto Polli - roberto.polli@par-tec.it
  • 31. RAFT Parsing: Goal • Plan a parsing strategy • Use basic regular expressions: match, search, sub • Benchmarking a parser • Running nosetests • Write a simple parser modules: re, nose, %timeit Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
  • 32. RAFT Parsing is hard... ”System Administrators spent 24.3% of their work-life parsing files.”∗ *Independent analysis by The GASP1 Society ;) 1 Grep Awk Sed Perl Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
  • 33. RAFT ...use a strategy! 1. Collect parsing samples 2. Play in ipython and collect %history 3. Write tests, then the parser 4. Eventually benchmark Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
  • 34. RAFT Parsing postfix logs # Before writing the parser, collect samples of # the interesting lines. For now just from course import mail_sent, mail_delivered # and %edit a simple def test_sent(): hour, host, to = parse_line(mail_sent) assert hour == ’08:00:00’ assert to == ’jon@doe.it’ Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
  • 35. RAFT Parsing lines: split, zip May 31 08:00:00 test-1 postfix/smtp[169]: 7CD8E730020: to= joe@foo.it , relay=mx2.foo.it[10.0.4.5]:25, ... mail_sent.split() # Start using basic strings in ipython # Then tie them with zip/zip() fields, counting = _, zip(range(20), _) fields = fields[:7] # We just care for the first 7 values # and pick fields singularly hour, host, dest = fields[2], fields[3], fields[6] Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
  • 36. RAFT Parse: Exercise I In another window • edit 03 parsing test.py • complete the parse line(line) function def parse_line(line): """Write your function and test it with test_sent()""" raise NotImplementedError %paste your solution’s code in iPython and run manually the test functions Parsing: 60’ Roberto Polli - roberto.polli@par-tec.it
  • 37. RAFT Python Regexp # Python supports regular expressions via import re # We start showing a grep-reloaded function def grep(expr, fpath): one = re.compile(expr) # ...has two lookup methods... assert ( one.match # which searches from ˆ the beginning and one. search ) # that searches anywhere with open(fpath) as fp: return [x for x in fp if one.search(x)] Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
  • 38. RAFT Splitting with re.split from re import split # is a very nice function # Let’s gather some ping stats if sys.platform.startswith(’win’): cmd = "ping -n10 www.google.it" else: cmd = "ping -c10 -w10 www.google.it" # Split for both space and = ping_output = [ split("[ =]", x) for x in sh(cmd)] Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
  • 39. RAFT Splitting with re.findall from re import findall # can be misused too ;) # eg. for adding the ":" to a mac = "00""24""e8""b4""33""20" # ...using this re_hex = ’[0-9A-Fa-f]{2}’ mac_address = ’:’.join(findall(re_hex, mac)) print("The mac address is ", mac_address) Actually this does a bit of validation, requiring all chars to be in the 0-F range Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
  • 40. RAFT Benchmarking in iPython I • Parsing big files needs benchmarks. iPython %timeit magic is a good starting point. test_regexps = ("..", "[a-fA-F0-9]{2}") for re_s in test_regexps: %timeit ’:’.join(findall (re_s, mac)) • We can even compare compiled and inline regexp import re for re_s in test_regexps: re_c = re.compile (re_s) %timeit ’:’.join(re_c.findall (mac)) Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
  • 41. RAFT Benchmarking in iPython II Or find other methods: • complex... from re import sub as sed %timeit sed(r’(..)’, r’1:’, mac) • ...or simple %timeit ’:’.join([ mac[i:i+2] for i in range(0,12,2)]) • Outside iPython check the timeit module Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
  • 42. RAFT ♦Parsing: a real world Example ♦ # Don’t need to type this VSAN configuration script # which uses linux FC information from /sys filesystem fc_id_path = "/sys/class/fc_host/host*/port_name" for x in glob(fc_id_path): # ...we boldly skip an explicit close() pwwn = open(x).read() # 0x500143802427e66c pwwn = pwwn[2:] # ...and even use the slower but readable pwwn = re.findall(r’..’, pwwn) print("member pwwn ", ’:’.join(pwwn)) Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
  • 43. RAFT Parsing logs: a simple solution def parse_line(line): import re # using _ we improve readability _, _, hour, host, _, _, dest = line.split()[:7] try: # and if dest isn’t what we expect... dest = re.split(r’[<>]’,dest)[1] except IndexError: # ...we set it to None dest = None return (hour, host, dest) Parsing: 60’Regular Expressions Roberto Polli - roberto.polli@par-tec.it
  • 44. RAFT Parsing logs: II # Now another test for the delivered messages # %edit 03_parsing_test def test_delivered(): hour, host, destination = parse_line(test_str_2) assert hour == ’08:00:00’ # Delivery logs should have destination == None assert destination is None # Exercise: fix parse_line to work with both tests # and save test Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
  • 45. RAFT Running nosetest • Now run the following command from a shell # nosetests -vs 03_parsing_test.py 03_parsing_test.test_sent ... ok 03_parsing_test.test_delivered ... ok Ran 2 tests in 0.001s • Nose is a test framework. • Nose runs every file matching test * • Nose runs every function matching test * Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
  • 46. RAFT Simple Test Script • Open the 02 nosetests simple.py file def setup(): print("is run before the testsuite, while") def teardown(): print("after all tests") def test_one(): # name a function like test_* to run it! assert 1 == 1 def test_two(): # and use assert to test for success assert 1 == 0, "I was expecting 0" Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
  • 47. RAFT ♦Complete Test Script: I ♦ • A more flexible script is 02 nosetests full.py which uses a Test class class Test(object): @classmethod def setup_class(self): # is run once at startup, # ..eg. to create database structure print("setup testsuite environment") open("/tmp/test2.out", "w").write("0") @classmethod def teardown_class(self): # is run once after all tests to... print("cleanup testsuite environment") os.unlink("/tmp/test2.out") Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
  • 48. RAFT ♦Complete Test Script: II ♦ • allowing pre-post testsuite and pre-post test fixtures class Test(object): ... # Using a Test class... def setup(self): print("is_run_before_every_test") #..and.. def teardown(self): print("after_every_test") # eg truncate a table # each test can use the prepared environment def test_a(self): assert os.path.isfile("/tmp/test2.out") Nosetest Intermezzo: 15’ Roberto Polli - roberto.polli@par-tec.it
  • 49. RAFT Simple processing: Goal • Handle gathered data with dict() and zip() • Find data relation with scipy • Get essential information like standard deviation σ and distributions δ • Linear correlation: what’s that, when can help • Plotting modules: numpy, scipy, scipy.stats.stats, collections, random, time Processing: 45’ Roberto Polli - roberto.polli@par-tec.it
  • 50. RAFT The Chicken Paradox “‘According to latest statistics, it appears that you eat one chicken per year: and, if that doesn’t fit your budget, you’ll fit into statistic anyway, because someone will eat two.”’ C. A. Salustri Processing: 45’ Roberto Polli - roberto.polli@par-tec.it
  • 51. RAFT Simple processing: Exercise How to dismantle the chicken paradox? Gather data! • Write the following function using our parsing strategy def ping_rtt(seconds=10): """@return: a list of ping RTT""" from course import sh # get sample output # find a solution in ipython # test and paste the code raise NotImplementedError • Gather 10 seconds of ping output • Hint: reuse the sh() function • Hint: slice and filter lists using comprehension Processing: 45’Distributions Roberto Polli - roberto.polli@par-tec.it
  • 52. RAFT Distributions: set, defaultdict A distribution or δ shows the frequency of events, like how many people ate x chickens ;) #Create a simple δ with Counter from collection import Counter d = Counter(rtt) # We can even use a more flexible from collections import defaultdict d = defaultdict(int) for x in rtt: distro[x] += 1 Distributions and Mean are both important! Processing: 45’Distributions Roberto Polli - roberto.polli@par-tec.it
  • 53. RAFT Standard Deviation: scipy • Standard deviation or σ formula is σ2 (X) := (x−¯x)2 n • σ tells if δ is fair or not, and how much the mean (¯x) is representative • matplotlib.mlab.normpdf is a smooth function approximating the histogram from scipy import std, mean fair = [1, 1] # chickens unfair = [0, 2] # chickens assert mean(fair) == mean(unfair) # Use standard deviation! std(fair) # 0 std(unfair) # 1 Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
  • 54. RAFT Simple processing: scipy Check your computed values vs the σ returned by ping (didn’t you notice ping returned it?) """goal: remember to convert to numeric / float goal: use scipy goal: check stdev""" from scipy import std, mean # max,min are builtin rtt = ping_rtt() print(max(rtt), min(rtt), mean(rtt), std(rtt)) Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
  • 55. RAFT Time Distributions: Exercise • Parse the provided maillog in ipython using its ! magic and get an hourly email δ • Expected output: time_d = { # mail delivered (removed) between 0: xxx # 00:00 - 00:59 1: xxx # 01:00 - 01:59 .. } Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
  • 56. RAFT Time Distributions: Exercise Solution # deliveder emails are like the following #May 14 16:00:04 rpolli postfix/qmgr[122]: 4DC3DA: removed" ret = !grep removed maillog # get the interesting lines ts = ret.fields(2) # find the timestamp (3rd column) hours = [ int(ts) for x in ts ] time_d = {x: count(x) for x in set(hours)} Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
  • 57. RAFT Plotting distributions # To plot data.. from matplotlib import pyplot as plt # and set the interactive mode plt.ion() # Plotting an histogram... frequency, bins, _ = hist(hours) # .. returns a distribution = dict(zip(slots, frequency)) This server works mostly at night... Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
  • 58. RAFT Size Distributions: Exercise • Create a size δ using hist(..., bins=...) • Hint: help(hist) size_d = { # mail size between 0: xxx # 0 - 10k 1: xxx # 10k - 20k .. } • Homework: Use the size δ to find size mean and size sigma and compare with σ and mean evaluated from the original data-series Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
  • 59. RAFT ♦Simulating data with σ and ¯x ♦ Mean and a stdev are useful starting point to simulate data using the gaussian distribution. # A mail load generator creating attachments of a given size... from random import gauss mail_size = gauss(mean, sigma_s) # a random number # and use time_d to simulate the load during the day from time import localtime hour = localtime().tm_hour mail_per_minute = time_d[hour] / 60 # minutes in hour Processing: 45’Deviation Roberto Polli - roberto.polli@par-tec.it
  • 60. RAFT Linear Correlation # Let’s plot the following datasets # taken from a 4-hour distribution mail_sent = [1, 5, 500, 250, 100, 7] kB_s = [70, 300, 29000, 12500, 450, 500] # A scatter plot can suggest relations # between data plt.scatter(mail_sent, kB_s) Correlating Mail and Thruput 100 0 100 200 300 400 500 600 kMailsent 5000 0 5000 10000 15000 20000 25000 30000 35000 ThruputkB/s Correlatingmailandthruput Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
  • 61. RAFT Linear Correlation The Pearson Coefficient ρ is a relation indicator. 0 no relation 1 direct relation (both dataset increase together) -1 inverse relation (one increase as the other decrease) ρ(X, Y ) = (x − ¯x)(y − ¯y) (x − ¯x)2 (y − ¯y)2 (1) from scipy.stats.stats import pearsonr ret = pearsonr(mail_sent, kB_s) print(ret) >(0.9823, 0.0004) correlation, probability = ret Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
  • 62. RAFT You must (scatter) plot! ρ does not detect non-linear correlation Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
  • 63. RAFT Combinations # Given a table with many data series from course import table table = {... ’cpu_usr’: [10, 23, 55, ..], ’byte_in’: [2132, 3212, 3942, ..], } # We can combine all their names with from itertools import combinations list(combinations(table,2)) >[(’swap_in’, ’cpu_sys’), (’swap_in’, ’csw’), (’cpu_sys’, ’csw’)... ] Combinating 4 suites, 2 at a time. ♥♠ ♥♣ ♥♦ ♠♣ ♠♦ ♣♦ Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
  • 64. RAFT Netfishing correlation We can try every combination between data series and check if there’s some ρ. for k1, k2 in combinations(table, 2): corr, probability = pearsonr(table[k1], table[k2]) if corr < 0.5: # I’m *still* not interested in data under this threshold continue print("linear correlation between {} and {} is {}".format( k1, k2, corr)) Processing: 45’Correlation Roberto Polli - roberto.polli@par-tec.it
  • 65. RAFT Correlating I/O and Context Switch Now we’ll generate some correlation plots from table data, like this one. Processing: 45’Plotting Time Roberto Polli - roberto.polli@par-tec.it
  • 66. RAFT Netfishing correlation II # create all combined plot for k1, k2 in combinations(table, 2): corr, probability = pearsonr(table[k1], table[k2]) plt.scatter(table[k1], table[k2]) # 3 digit precision on title plt.title("R={:0.3f}".format(corr)) plt.xlabel(k1); plt.ylabel(k2) # save and close the plot plt.savefig("{}_{}.png".format(k1, k2)); plt.close() Processing: 45’Plotting Time Roberto Polli - roberto.polli@par-tec.it
  • 67. RAFT Mark time with colors # Get combined data directly via items # using 3 buckets buckets = 3 for (k1, v1), (k2, v2) in combinations(table. items (), 2): corr, probability = pearsonr(v1, v2) length = len(v1) # Get an array of colors # eg. [0, 0, ..., 1, 1, .., 2, 2, ...] colors = [(i * buckets / l) for i in xrange(l) ] # iterate colors with a nice colorbar plt.scatter(t1, t2, color=colors) Processing: 45’Plotting Time Roberto Polli - roberto.polli@par-tec.it
  • 68. RAFT That’s all folks! Thank you for the attention! Roberto Polli - roberto.polli@par-tec.it End Roberto Polli - roberto.polli@par-tec.it