SlideShare a Scribd company logo
1 of 18
Download to read offline
Statistics 101 for System
Administrators
EuroPython 2014, 22th
July - Berlin
Roberto Polli - roberto.polli@babel.it
Babel Srl P.zza S. Benedetto da Norcia, 33
00040, Pomezia (RM) - www.babel.it
22 July 2014
Roberto Polli - roberto.polli@babel.it
Who? What? Why?
• Using (and learning) elements of statistics with python.
• Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java
and Python. Red Hat Certified Engineer and Virtualization Administrator.
• Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures
based on Open Source software for Italian ISP and PA. Contributes to
various FLOSS.
Intro Roberto Polli - roberto.polli@babel.it
Agenda
• A latency issue: what happened?
• Correlation in 30”
• Combining data
• Plotting time
• modules: scipy, matplotlib
Intro Roberto Polli - roberto.polli@babel.it
A Latency Issue
• Episodic network latency issues
• Logs traces: message size, #peers, retransimissions
• Do we need to scale? Was a peak problem?
Find a rapid answer with python!
Intro Roberto Polli - roberto.polli@babel.it
Basic statistics
Python provides basic statistics, like
from scipy.stats import mean # ¯x
from scipy.stats import std # σX
T = { ’ts’: (1, 2, 3, .., ),
’late’: (0.12, 6.31, 0.43, .. ),
’peers’: (2313, 2313, 2312, ..),...}
print([k, max(X), min(X), mean(X), std(X) ]
for k, X in T.items() ])
Intro Roberto Polli - roberto.polli@babel.it
Distributions
Data distribution - aka δX - shows event frequency.
# The fastest way to get a
# distribution is
from matplotlib import pyplot as plt
freq, bins, _ = plt.hist(T[’late’])
# plt.hist returns a
distribution = zip(bins, freq)
A ping rtt distribution
158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0
rtt in ms
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0 Ping RTT distribution
r
Intro Roberto Polli - roberto.polli@babel.it
Correlation I
Are two data series X, Y related?
Given ∆xi = xi − ¯x Mr. Pearson answered with this formula
ρ(X, Y ) = i ∆xi ∆yi
i ∆2xi ∆2yi
∈ [−1, +1] (1)
ρ identifies if the values of X and Y ‘move’ together on the same line.
Intro Roberto Polli - roberto.polli@babel.it
You must (scatter) plot
ρ doesn’t find non-linear correlation!
Intro Roberto Polli - roberto.polli@babel.it
Probability Indicator
Python scipy provides a correlation function, returning two values:
• the ρ correlation coefficient ∈ [−1, +1]
• the probability that such datasets are produced by uncorrelated systems
from scipy.stats.stats import pearsonr # our beloved ρ
a, b = range(0, 100), range(0, 400, 4)
c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a]
correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000
correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683
Intro Roberto Polli - roberto.polli@babel.it
Combinations
itertools is a gold pot of useful tools.
from itertools import combinations
# returns all possible combination of
# items grouped by N at a time
items = "heart spades clubs diamonds".split()
combinations(items, 2)
# And now all possible combinations between
# dataset fields!
combinations(T, 2)
Combinating 4 suites,
2 at a time.
♥♠
♥♣
♥♦
♠♣
♠♦
♣♦
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation I
# Now we have all the ingredients for
# net-fishing relations between our data!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
# Look for correlations between every dataset!
corr, prob = pearsonr(v1, v2)
if corr > .6:
print("Series", k1, k2, "can be correlated", corr)
elif prob < 0.05:
print("Series", k1, k2, "probability lower than 5%%", prob)
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation II
Now plot all combinations: there’s more to meet with eyes!
# Plot everything, and insert data in plots!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
corr, prob = pearsonr(v1, v2)
plt.scatter(v1, v2)
# 3 digit precision on title
plt.title("R={:0.3f} P={:0.3f}".format(corr, prob))
plt.xlabel(k1); plt.ylabel(k2)
# save and close the plot
plt.savefig("{}_{}.png".format(k1, k2)); plt.close()
Intro Roberto Polli - roberto.polli@babel.it
Plotting Correlation
Intro Roberto Polli - roberto.polli@babel.it
Color is the 3rd dimension
from itertools import cycle
colors = cycle("rgb") # use more than 3 colors!
labels = cycle("morning afternoon night".split())
size = datalen / 3 # 3 colors, right?
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
[ plt.scatter( t1[i:i+size] , t2[i:i+size],
color=next(colors),
label=next(labels)
) for i in range(0, datalen, size) ]
# set title, save plot & co
Intro Roberto Polli - roberto.polli@babel.it
Example Correlation
Intro Roberto Polli - roberto.polli@babel.it
Latency Solution
• Latency wasn’t related to packet size or system throughput
• Errors were not related to packet size
• Discovered system throughput
Intro Roberto Polli - roberto.polli@babel.it
Wrap Up
• Use statistics: it’s easy
• Don’t use ρ to exclude relations
• Plot, Plot, Plot
• Continue collecting results
Intro Roberto Polli - roberto.polli@babel.it
That’s all folks!
Thank you for the attention!
Roberto Polli - roberto.polli@babel.it
Intro Roberto Polli - roberto.polli@babel.it

More Related Content

Similar to Statistics 101 for System Administrators

Magical float repr
Magical float reprMagical float repr
Magical float reprdickinsm
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisDataWorks Summit
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientistsaeberspaecher
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and rKelli-Jean Chun
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying ObjectsDavid Evans
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Simplilearn
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine LearningYounesCharfaoui
 

Similar to Statistics 101 for System Administrators (20)

Magical float repr
Magical float reprMagical float repr
Magical float repr
 
Python slide
Python slidePython slide
Python slide
 
Relational Calculus
Relational CalculusRelational Calculus
Relational Calculus
 
CPPDS Slide.pdf
CPPDS Slide.pdfCPPDS Slide.pdf
CPPDS Slide.pdf
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Ibmr 2014
Ibmr 2014Ibmr 2014
Ibmr 2014
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying Objects
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Have you met Julia?
Have you met Julia?Have you met Julia?
Have you met Julia?
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
11 Python CBSE Syllabus
11    Python CBSE Syllabus11    Python CBSE Syllabus
11 Python CBSE Syllabus
 
11 syllabus
11    syllabus11    syllabus
11 syllabus
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine Learning
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 

More from Roberto Polli

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTPRoberto Polli
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Roberto Polli
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggeraRoberto Polli
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstackRoberto Polli
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestRoberto Polli
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.Roberto Polli
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System AdministratorsRoberto Polli
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Roberto Polli
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerRoberto Polli
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?Roberto Polli
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyRoberto Polli
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repositoryRoberto Polli
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embeddedRoberto Polli
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceRoberto Polli
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009Roberto Polli
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1Roberto Polli
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Roberto Polli
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPRoberto Polli
 

More from Roberto Polli (20)

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTP
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggera
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetest
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System Administrators
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and Docker
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are Easy
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repository
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embedded
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open source
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAP
 
ultimo-miglio-v3
ultimo-miglio-v3ultimo-miglio-v3
ultimo-miglio-v3
 
Ultimo Miglio v2
Ultimo Miglio v2Ultimo Miglio v2
Ultimo Miglio v2
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 

Recently uploaded (20)

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 

Statistics 101 for System Administrators

  • 1. Statistics 101 for System Administrators EuroPython 2014, 22th July - Berlin Roberto Polli - roberto.polli@babel.it Babel Srl P.zza S. Benedetto da Norcia, 33 00040, Pomezia (RM) - www.babel.it 22 July 2014 Roberto Polli - roberto.polli@babel.it
  • 2. Who? What? Why? • Using (and learning) elements of statistics with python. • Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java and Python. Red Hat Certified Engineer and Virtualization Administrator. • Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures based on Open Source software for Italian ISP and PA. Contributes to various FLOSS. Intro Roberto Polli - roberto.polli@babel.it
  • 3. Agenda • A latency issue: what happened? • Correlation in 30” • Combining data • Plotting time • modules: scipy, matplotlib Intro Roberto Polli - roberto.polli@babel.it
  • 4. A Latency Issue • Episodic network latency issues • Logs traces: message size, #peers, retransimissions • Do we need to scale? Was a peak problem? Find a rapid answer with python! Intro Roberto Polli - roberto.polli@babel.it
  • 5. Basic statistics Python provides basic statistics, like from scipy.stats import mean # ¯x from scipy.stats import std # σX T = { ’ts’: (1, 2, 3, .., ), ’late’: (0.12, 6.31, 0.43, .. ), ’peers’: (2313, 2313, 2312, ..),...} print([k, max(X), min(X), mean(X), std(X) ] for k, X in T.items() ]) Intro Roberto Polli - roberto.polli@babel.it
  • 6. Distributions Data distribution - aka δX - shows event frequency. # The fastest way to get a # distribution is from matplotlib import pyplot as plt freq, bins, _ = plt.hist(T[’late’]) # plt.hist returns a distribution = zip(bins, freq) A ping rtt distribution 158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0 rtt in ms 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Ping RTT distribution r Intro Roberto Polli - roberto.polli@babel.it
  • 7. Correlation I Are two data series X, Y related? Given ∆xi = xi − ¯x Mr. Pearson answered with this formula ρ(X, Y ) = i ∆xi ∆yi i ∆2xi ∆2yi ∈ [−1, +1] (1) ρ identifies if the values of X and Y ‘move’ together on the same line. Intro Roberto Polli - roberto.polli@babel.it
  • 8. You must (scatter) plot ρ doesn’t find non-linear correlation! Intro Roberto Polli - roberto.polli@babel.it
  • 9. Probability Indicator Python scipy provides a correlation function, returning two values: • the ρ correlation coefficient ∈ [−1, +1] • the probability that such datasets are produced by uncorrelated systems from scipy.stats.stats import pearsonr # our beloved ρ a, b = range(0, 100), range(0, 400, 4) c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a] correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000 correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683 Intro Roberto Polli - roberto.polli@babel.it
  • 10. Combinations itertools is a gold pot of useful tools. from itertools import combinations # returns all possible combination of # items grouped by N at a time items = "heart spades clubs diamonds".split() combinations(items, 2) # And now all possible combinations between # dataset fields! combinations(T, 2) Combinating 4 suites, 2 at a time. ♥♠ ♥♣ ♥♦ ♠♣ ♠♦ ♣♦ Intro Roberto Polli - roberto.polli@babel.it
  • 11. Netfishing correlation I # Now we have all the ingredients for # net-fishing relations between our data! for (k1,v1), (k2,v2) in combinations(T.items(), 2): # Look for correlations between every dataset! corr, prob = pearsonr(v1, v2) if corr > .6: print("Series", k1, k2, "can be correlated", corr) elif prob < 0.05: print("Series", k1, k2, "probability lower than 5%%", prob) Intro Roberto Polli - roberto.polli@babel.it
  • 12. Netfishing correlation II Now plot all combinations: there’s more to meet with eyes! # Plot everything, and insert data in plots! for (k1,v1), (k2,v2) in combinations(T.items(), 2): corr, prob = pearsonr(v1, v2) plt.scatter(v1, v2) # 3 digit precision on title plt.title("R={:0.3f} P={:0.3f}".format(corr, prob)) plt.xlabel(k1); plt.ylabel(k2) # save and close the plot plt.savefig("{}_{}.png".format(k1, k2)); plt.close() Intro Roberto Polli - roberto.polli@babel.it
  • 13. Plotting Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 14. Color is the 3rd dimension from itertools import cycle colors = cycle("rgb") # use more than 3 colors! labels = cycle("morning afternoon night".split()) size = datalen / 3 # 3 colors, right? for (k1,v1), (k2,v2) in combinations(T.items(), 2): [ plt.scatter( t1[i:i+size] , t2[i:i+size], color=next(colors), label=next(labels) ) for i in range(0, datalen, size) ] # set title, save plot & co Intro Roberto Polli - roberto.polli@babel.it
  • 15. Example Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 16. Latency Solution • Latency wasn’t related to packet size or system throughput • Errors were not related to packet size • Discovered system throughput Intro Roberto Polli - roberto.polli@babel.it
  • 17. Wrap Up • Use statistics: it’s easy • Don’t use ρ to exclude relations • Plot, Plot, Plot • Continue collecting results Intro Roberto Polli - roberto.polli@babel.it
  • 18. That’s all folks! Thank you for the attention! Roberto Polli - roberto.polli@babel.it Intro Roberto Polli - roberto.polli@babel.it