SlideShare a Scribd company logo
1 of 16
Download to read offline
Detecting Anomalies
in Nginx Log Data
#nginx #nginxconf
Mauricio Roman
Sept. 24, 2015
Who am I?
• Work as a Data Scientist
• Abundant log data
• Using algorithms to better serve our customers
— and their end users
#nginx #nginxconf
Exploration of our own Nginx
logs
• We collect all sorts of log messages from our
customers, with very high throughput
• Use both syslog and http protocols
• Nginx receives all our http log messages and
forwards them to our back-end
#nginx #nginxconf
What do we mean by
“anomalies”?
• By “anomalies” we mean: isolating events which
are in some way, unexpected or undesirable
• 4xxs and 5xx are in this broad sense “anomalous”
• A first look, using tail and grep, revealed that we
needed to focus on 4xx codes from certain
requests
#nginx #nginxconf
Should we worry about 4xx
errors?
• 4xx rates of errors are low and fairly stable in our case, and
we have no 5xx errors
• There is a common belief that 4xx require no further analysis
• For us, however, 4xx are significant:
• Most of our http ingestion is via POST requests
• Yet we also obtain log data via GET requests with
tracking pixels
• Used by some customers for their mobile end-users
#nginx #nginxconf
Extracted “Features” or
dimensions from http log data
• Payload size in bytes
• Country of origin
• OS
• Browser
• IP
• Referer host
• Date and time
#nginx #nginxconf
One can get more than
100 features from Nginx log
data alone (including
headers)
In my sandbox, filter logs in real time
and send them to a Kafka queue
tail -F /var/log/nginx/access.log | fgrep -v ‘“ 200 ’
| fgrep -v OPTIONS | fgrep gif | awk 'length($0) >
65 {print}’" | ~/kafka_2.10-0.8.2.1/bin/kafka-
console-producer.sh --broker-list localhost:9092 --
topic nginx_filtered_logs
#nginx #nginxconf
Kafka works like a shock
absorber to avoid
propagating bursts
A Python script reads from Kafka and
parses logs, using standard libraries
import re, geolite2, woothee, urlparse, time
msg_regex = re.compile('(?P<ipaddress>d{1,3}.d{1,3}.d{1,3}.d{1,3}) - - [(?
P<dateandtime>d{2}/[a-zA-Z]{3}/d{4}:d{2}:d{2}:d{2} (+|-)d{4})] (("(?P<method>GET|
POST|HEAD) )(?P<url>.+)(HTTP/1.d")*) (?P<statuscode>d{3}) (?P<bytessent>d+) (["](?
P<referer>(-)|(.*))["]) (["](?P<useragent>.*)["])')

consumer = KafkaConsumer(“nginx_filtered_logs”, group_id=‘my_group’,
bootstrap_servers=['localhost:9092'])



for message in consumer:

time.sleep(0.2)
msg = message.value



m = msg_regex.match( msg )

ip_match = geolite2.lookup(m.group('ipaddress'))

useragent_match = woothee.parse(m.group('useragent'))

referer_match = urlparse(m.group('referer'))



nginx_status_code = m.group('statuscode')



payload = m.group('url')

size = len(payload)



os = useragent_match["os"]

ipaddress = m.group(‘ipaddress’)
country = ip_match.country
#nginx #nginxconf
Using freely
available open
source parsing
libraries for this
example
For visualization, sent parsed
features as messages to Loggly
Finding #1:
“408” http errors correlated with
large GET payload size
#nginx #nginxconf
Plot of payload size split by status code counts over time
Finding # 2: Most of our 4xx
errors come from Opera browser
#nginx #nginxconf
Count of 4xx messages by browser type for given time interval
Finding # 3: Opera browser users with 4xx
errors mostly from Indonesia, South Africa,
and South Asia (Bangladesh, India, Pakistan)
#nginx #nginxconf
Countries of origin and status codes for Opera browser users
What if we could automate
this exploration of anomalies?
• Start with the basics: 4xx and 5xx are anomalies
• These anomalies appear in clusters along with
other dimensions available from HTTP data
• Some use cases call for correlating http
anomalies with application log message
anomalies ( exceptions and errors )…
• …In real time
#nginx #nginxconf
Monitoring
and alerting of
rates of
HTTP errors
Multi-dimensional
analysis of HTTP
log data
Advanced
parsing and
automated
clustering
Correlation of
HTTP with other
application data
(exceptions,
errors) in real
time
A vision for growing log data
analytical capabilities
#nginx #nginxconf
Tail | grep | cut |
awk | sort | uniq
of log data
Example
presented
today
Customer
prototype
(TAFLUC)
The customer, a Content Management
Platform, needed to identify unusual rates
of 5xx and PHP errors in real time
• Discarded all 200s
• Only looking for anomalies
“within the anomalies”
• Real time analysis by
customer
Rate of 5xx codes over all non-200 codes
Rate of PHP
fatal over all
PHP level
messages
True
Anomalies
#nginx #nginxconf
• Our customer wants to notify
their end-users as they
deploy plug-ins
• Reduce baseline rate of 5xx
errors across platform
Conclusion
• Everyone is familiar with simple, one dimensional
analytics: 4xx and 5xx are bad
• This does not always show the full picture
• By expanding the data features available from Nginx logs
its possible to identify more interesting patterns
• Good visualization is important because humans are
better than computers (today)
• Algorithms are also important to automate the
process and save time
#nginx #nginxconf
Thank You!
Twitter: @RomanRojasM
SlideShare: http://www.slideshare.net/
mauriciocostarica
#nginx #nginxconf

More Related Content

What's hot

Scaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @NetflixScaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @NetflixC4Media
 
Altitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edgeAltitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edgeFastly
 
Nginx Deep Dive Kubernetes Ingress
Nginx Deep Dive Kubernetes IngressNginx Deep Dive Kubernetes Ingress
Nginx Deep Dive Kubernetes IngressKnoldus Inc.
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesContainer Solutions
 
Code lifecycle on the Acquia Cloud Platform
Code lifecycle on the Acquia Cloud PlatformCode lifecycle on the Acquia Cloud Platform
Code lifecycle on the Acquia Cloud PlatformTimothy Hilliard
 
Building event streaming pipelines using Apache Pulsar
Building event streaming pipelines using Apache PulsarBuilding event streaming pipelines using Apache Pulsar
Building event streaming pipelines using Apache PulsarStreamNative
 
Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...
Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...
Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...Nagios
 
Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3Nick Sullivan
 
Fluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUFluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUN Masahiro
 
A Cassandra driver from and for the Lua community
A Cassandra driver from and for the Lua communityA Cassandra driver from and for the Lua community
A Cassandra driver from and for the Lua communityThibault Charbonnier
 
gRPC & Kubernetes
gRPC & KubernetesgRPC & Kubernetes
gRPC & KubernetesKausal
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13Thibault Charbonnier
 
DEFCON 28: 21 Jump Server: Going Bastionless in the Cloud
DEFCON 28: 21 Jump Server: Going Bastionless in the CloudDEFCON 28: 21 Jump Server: Going Bastionless in the Cloud
DEFCON 28: 21 Jump Server: Going Bastionless in the CloudColin Estep
 
gRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at SquaregRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at SquareApigee | Google Cloud
 
Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconN Masahiro
 
Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaTrieu Nguyen
 
What’s New in NGINX Plus R16?
What’s New in NGINX Plus R16?What’s New in NGINX Plus R16?
What’s New in NGINX Plus R16?NGINX, Inc.
 
CNIT 141 7. Keyed Hashing
CNIT 141 7. Keyed HashingCNIT 141 7. Keyed Hashing
CNIT 141 7. Keyed HashingSam Bowne
 

What's hot (20)

Scaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @NetflixScaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @Netflix
 
Altitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edgeAltitude SF 2017: Logging at the edge
Altitude SF 2017: Logging at the edge
 
Nginx Deep Dive Kubernetes Ingress
Nginx Deep Dive Kubernetes IngressNginx Deep Dive Kubernetes Ingress
Nginx Deep Dive Kubernetes Ingress
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
 
Code lifecycle on the Acquia Cloud Platform
Code lifecycle on the Acquia Cloud PlatformCode lifecycle on the Acquia Cloud Platform
Code lifecycle on the Acquia Cloud Platform
 
Building event streaming pipelines using Apache Pulsar
Building event streaming pipelines using Apache PulsarBuilding event streaming pipelines using Apache Pulsar
Building event streaming pipelines using Apache Pulsar
 
Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...
Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...
Nagios Conference 2014 - Jack Chu - How to Think With Nagios to Solve Monitor...
 
Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3
 
Fluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EUFluentd Project Intro at Kubecon 2019 EU
Fluentd Project Intro at Kubecon 2019 EU
 
A Cassandra driver from and for the Lua community
A Cassandra driver from and for the Lua communityA Cassandra driver from and for the Lua community
A Cassandra driver from and for the Lua community
 
gRPC & Kubernetes
gRPC & KubernetesgRPC & Kubernetes
gRPC & Kubernetes
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13
 
DEFCON 28: 21 Jump Server: Going Bastionless in the Cloud
DEFCON 28: 21 Jump Server: Going Bastionless in the CloudDEFCON 28: 21 Jump Server: Going Bastionless in the Cloud
DEFCON 28: 21 Jump Server: Going Bastionless in the Cloud
 
Kube 101
Kube 101Kube 101
Kube 101
 
gRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at SquaregRPC: The Story of Microservices at Square
gRPC: The Story of Microservices at Square
 
Fluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at KubeconFluentd and Distributed Logging at Kubecon
Fluentd and Distributed Logging at Kubecon
 
Real time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, KafkaReal time analytics with Netty, Storm, Kafka
Real time analytics with Netty, Storm, Kafka
 
What’s New in NGINX Plus R16?
What’s New in NGINX Plus R16?What’s New in NGINX Plus R16?
What’s New in NGINX Plus R16?
 
CNIT 141 7. Keyed Hashing
CNIT 141 7. Keyed HashingCNIT 141 7. Keyed Hashing
CNIT 141 7. Keyed Hashing
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 

Similar to Nginx conf.compressed

NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and TuningNGINX, Inc.
 
Open Source Logging and Metric Tools
Open Source Logging and Metric ToolsOpen Source Logging and Metric Tools
Open Source Logging and Metric ToolsPhase2
 
Fedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkFedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkRainer Gerhards
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Intro to Telegraf
Intro to TelegrafIntro to Telegraf
Intro to TelegrafInfluxData
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Deploying nginx with minimal system resources
Deploying nginx with minimal system resourcesDeploying nginx with minimal system resources
Deploying nginx with minimal system resourcesMax Ukhanov
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformRedge Technologies
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
CNIT 152 10 Enterprise Service
CNIT 152 10 Enterprise ServiceCNIT 152 10 Enterprise Service
CNIT 152 10 Enterprise ServiceSam Bowne
 
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022InfluxData
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
Php through the eyes of a hoster phpbnl11
Php through the eyes of a hoster phpbnl11Php through the eyes of a hoster phpbnl11
Php through the eyes of a hoster phpbnl11Combell NV
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Load Balancing Applications with NGINX in a CoreOS Cluster
Load Balancing Applications with NGINX in a CoreOS ClusterLoad Balancing Applications with NGINX in a CoreOS Cluster
Load Balancing Applications with NGINX in a CoreOS ClusterKevin Jones
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 

Similar to Nginx conf.compressed (20)

Running php on nginx
Running php on nginxRunning php on nginx
Running php on nginx
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
 
Open Source Logging and Metric Tools
Open Source Logging and Metric ToolsOpen Source Logging and Metric Tools
Open Source Logging and Metric Tools
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
Fedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 TalkFedora Developer's Conference 2014 Talk
Fedora Developer's Conference 2014 Talk
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Intro to Telegraf
Intro to TelegrafIntro to Telegraf
Intro to Telegraf
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Deploying nginx with minimal system resources
Deploying nginx with minimal system resourcesDeploying nginx with minimal system resources
Deploying nginx with minimal system resources
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
CNIT 152 10 Enterprise Service
CNIT 152 10 Enterprise ServiceCNIT 152 10 Enterprise Service
CNIT 152 10 Enterprise Service
 
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
Php through the eyes of a hoster phpbnl11
Php through the eyes of a hoster phpbnl11Php through the eyes of a hoster phpbnl11
Php through the eyes of a hoster phpbnl11
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Load Balancing Applications with NGINX in a CoreOS Cluster
Load Balancing Applications with NGINX in a CoreOS ClusterLoad Balancing Applications with NGINX in a CoreOS Cluster
Load Balancing Applications with NGINX in a CoreOS Cluster
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Nginx conf.compressed

  • 1. Detecting Anomalies in Nginx Log Data #nginx #nginxconf Mauricio Roman Sept. 24, 2015
  • 2. Who am I? • Work as a Data Scientist • Abundant log data • Using algorithms to better serve our customers — and their end users #nginx #nginxconf
  • 3. Exploration of our own Nginx logs • We collect all sorts of log messages from our customers, with very high throughput • Use both syslog and http protocols • Nginx receives all our http log messages and forwards them to our back-end #nginx #nginxconf
  • 4. What do we mean by “anomalies”? • By “anomalies” we mean: isolating events which are in some way, unexpected or undesirable • 4xxs and 5xx are in this broad sense “anomalous” • A first look, using tail and grep, revealed that we needed to focus on 4xx codes from certain requests #nginx #nginxconf
  • 5. Should we worry about 4xx errors? • 4xx rates of errors are low and fairly stable in our case, and we have no 5xx errors • There is a common belief that 4xx require no further analysis • For us, however, 4xx are significant: • Most of our http ingestion is via POST requests • Yet we also obtain log data via GET requests with tracking pixels • Used by some customers for their mobile end-users #nginx #nginxconf
  • 6. Extracted “Features” or dimensions from http log data • Payload size in bytes • Country of origin • OS • Browser • IP • Referer host • Date and time #nginx #nginxconf One can get more than 100 features from Nginx log data alone (including headers)
  • 7. In my sandbox, filter logs in real time and send them to a Kafka queue tail -F /var/log/nginx/access.log | fgrep -v ‘“ 200 ’ | fgrep -v OPTIONS | fgrep gif | awk 'length($0) > 65 {print}’" | ~/kafka_2.10-0.8.2.1/bin/kafka- console-producer.sh --broker-list localhost:9092 -- topic nginx_filtered_logs #nginx #nginxconf Kafka works like a shock absorber to avoid propagating bursts
  • 8. A Python script reads from Kafka and parses logs, using standard libraries import re, geolite2, woothee, urlparse, time msg_regex = re.compile('(?P<ipaddress>d{1,3}.d{1,3}.d{1,3}.d{1,3}) - - [(? P<dateandtime>d{2}/[a-zA-Z]{3}/d{4}:d{2}:d{2}:d{2} (+|-)d{4})] (("(?P<method>GET| POST|HEAD) )(?P<url>.+)(HTTP/1.d")*) (?P<statuscode>d{3}) (?P<bytessent>d+) (["](? P<referer>(-)|(.*))["]) (["](?P<useragent>.*)["])')
 consumer = KafkaConsumer(“nginx_filtered_logs”, group_id=‘my_group’, bootstrap_servers=['localhost:9092'])
 
 for message in consumer:
 time.sleep(0.2) msg = message.value
 
 m = msg_regex.match( msg )
 ip_match = geolite2.lookup(m.group('ipaddress'))
 useragent_match = woothee.parse(m.group('useragent'))
 referer_match = urlparse(m.group('referer'))
 
 nginx_status_code = m.group('statuscode')
 
 payload = m.group('url')
 size = len(payload)
 
 os = useragent_match["os"]
 ipaddress = m.group(‘ipaddress’) country = ip_match.country #nginx #nginxconf Using freely available open source parsing libraries for this example
  • 9. For visualization, sent parsed features as messages to Loggly Finding #1: “408” http errors correlated with large GET payload size #nginx #nginxconf Plot of payload size split by status code counts over time
  • 10. Finding # 2: Most of our 4xx errors come from Opera browser #nginx #nginxconf Count of 4xx messages by browser type for given time interval
  • 11. Finding # 3: Opera browser users with 4xx errors mostly from Indonesia, South Africa, and South Asia (Bangladesh, India, Pakistan) #nginx #nginxconf Countries of origin and status codes for Opera browser users
  • 12. What if we could automate this exploration of anomalies? • Start with the basics: 4xx and 5xx are anomalies • These anomalies appear in clusters along with other dimensions available from HTTP data • Some use cases call for correlating http anomalies with application log message anomalies ( exceptions and errors )… • …In real time #nginx #nginxconf
  • 13. Monitoring and alerting of rates of HTTP errors Multi-dimensional analysis of HTTP log data Advanced parsing and automated clustering Correlation of HTTP with other application data (exceptions, errors) in real time A vision for growing log data analytical capabilities #nginx #nginxconf Tail | grep | cut | awk | sort | uniq of log data Example presented today Customer prototype (TAFLUC)
  • 14. The customer, a Content Management Platform, needed to identify unusual rates of 5xx and PHP errors in real time • Discarded all 200s • Only looking for anomalies “within the anomalies” • Real time analysis by customer Rate of 5xx codes over all non-200 codes Rate of PHP fatal over all PHP level messages True Anomalies #nginx #nginxconf • Our customer wants to notify their end-users as they deploy plug-ins • Reduce baseline rate of 5xx errors across platform
  • 15. Conclusion • Everyone is familiar with simple, one dimensional analytics: 4xx and 5xx are bad • This does not always show the full picture • By expanding the data features available from Nginx logs its possible to identify more interesting patterns • Good visualization is important because humans are better than computers (today) • Algorithms are also important to automate the process and save time #nginx #nginxconf
  • 16. Thank You! Twitter: @RomanRojasM SlideShare: http://www.slideshare.net/ mauriciocostarica #nginx #nginxconf