SlideShare a Scribd company logo
1 of 14
Log Files
by Heinrich Hartmann
twitter: @HeinrichHartman
web: heinrich-hartmann.net
Where do log files come from?
Use Case: Distributed Web Applications
● Web Servers
● Application Servers
● Databases
● Network infrastructure:
Routers / Load balancers / Switches
How do Log Files look like?
Example Web Server Log:
● Timestamp
● Source hostname / IP
● Session ID (if available)
● Request URL
● Return code
● Reply size
Example Log File
NASA Dataset (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html)
Two months worth of HTTP request to Kennedy Space Center. (22 MB zipped)
What insights do Log Files provide?
● Monitoring
Are my servers up and running as expected?
How much resources are being used used?
Are my business metrics ok? (earnings/h)
● Troubleshooting / Debugging
Why is my system slow?
Where are messages dropped?
● Reporting / Mining
How do my users behave?
Click stream analysis. KPI calculations (click through, bounce rate)
Log Volume
Example Calculation for Wikipedia
● 200 application servers
● 20 database servers
● 70 cache servers
● 10k http requests per second (rps) peak load (peek: 50k rps)
● 80k SQL queries per second peak load
Apache Access Log rate:
10k req/sec * 100b log message = 1mb/s (peek: 5mb/s)
= 86gb/day (i.e. BIG)
Source:
http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/
http://reportcard.wmflabs.org/
https://ganglia.wikimedia.org/
Log File Processing
Web Server
Web Server
Web Server
Log Aggregator
Log Monitor
Log Analytics
Business Logic.
Generate Log Files.
Gathers log files from
individual servers and
stores them on a
central location
Real time reports,
dashboards, plots,
alerts
Batch processing,
data mining
Source: Theo Schlossnagle - Scalable Internet Architectures
● Local storage of log files on web servers
● Periodic “pull” aggregation of log files, via ftp or scp
Drawbacks:
● No real-time access to logs.
● No log files from crashed servers.
Classical Solution
Web Server
Web Server
Web Server
Log Aggregator
Log Monitor
Log Analytics
● Real-time aggregation of logs files (“push”)
● Need to use reliable transfer (syslog only provides UDP)
● Configuration management complicated,
- every web server needs to know about the log aggregator
- problematic if redundancy should be adedd
Real-Time Unicast
Web Server
Web Server
Web Server
Log Aggregator
Log Monitor
Log Analytics
Passive “Sniffing” Log aggregation
Web Server
Web Server
Web ServerLog Aggregator
● Log files are produced by sniffing network packages
● Very accurate log files
● No interaction/configuration of server required
● Need single egress point
● Security flaw (man in the middle attack). Not compatible with SSL.
Internet
● Real-time log distribution using group messaging (ZeroMQ/Spread/Thrift)
● Flexible communication patterns (allow multiple subscribers)
● Use reliable IP multicast to reduce network load
● Less configuration overhead (group subscriptions)
Log AnalyticsLog Aggregator
Best practice: Group communication
Web Server
Web Server
Web Server Log Monitor(s)
Further Topics
Real-Time Monitoring
● Splunk
● circonus
● kibana
● storm
Log batch analysis
● Map-Reduce/Hadoop
● Hive
● Drill/Dremel
Further Reading
* http://hortonworks.com/use-cases/server-logs-hadoop-example/
* http://www.slideshare.net/mapredit/apache-flume-ng
* Logstash
* http://www.elasticsearch.org/overview/kibana/
Further Steps
● Chefkoch Datensatz (log files from several months)
1. Inspect data
2. Gather interesting questions to data
3. Try to answer questions using big data processing
● Stream processing vs. batch processing (Thomas)
- Welche queries/operatoren können auf dem stream beantwortet werden
- knowledge discovery / feature selection -> Indexing
Other log file sources
● Sensor data analysis
● Profiling von software projekten (SOAMIG)

More Related Content

What's hot

Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
Utshab Saha
 
Active directory
Active directory Active directory
Active directory
deshvikas
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
Viet-Trung TRAN
 
High Availability and Disaster Recovery
High Availability and Disaster RecoveryHigh Availability and Disaster Recovery
High Availability and Disaster Recovery
Akelios
 

What's hot (20)

Apache web server
Apache web serverApache web server
Apache web server
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Load Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newpptLoad Balancing In Cloud Computing newppt
Load Balancing In Cloud Computing newppt
 
Active directory
Active directory Active directory
Active directory
 
Nagios
NagiosNagios
Nagios
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
Active directory slides
Active directory slidesActive directory slides
Active directory slides
 
11. dfs
11. dfs11. dfs
11. dfs
 
Deployment Models of Cloud Computing.pptx
Deployment Models of Cloud Computing.pptxDeployment Models of Cloud Computing.pptx
Deployment Models of Cloud Computing.pptx
 
RAID
RAIDRAID
RAID
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems
 
High Availability and Disaster Recovery
High Availability and Disaster RecoveryHigh Availability and Disaster Recovery
High Availability and Disaster Recovery
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Google File System
Google File SystemGoogle File System
Google File System
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Different types of virtualisation
Different types of virtualisationDifferent types of virtualisation
Different types of virtualisation
 
NIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference ArchitectureNIST Cloud Computing Reference Architecture
NIST Cloud Computing Reference Architecture
 
Active directory and application
Active directory and applicationActive directory and application
Active directory and application
 

Similar to Log Files

Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
Tim Bell
 

Similar to Log Files (20)

Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysis
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Observability tips for HAProxy
Observability tips for HAProxyObservability tips for HAProxy
Observability tips for HAProxy
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
Web performance mercadolibre - ECI 2013
Web performance   mercadolibre - ECI 2013Web performance   mercadolibre - ECI 2013
Web performance mercadolibre - ECI 2013
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 

More from Heinrich Hartmann

GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONSGROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
Heinrich Hartmann
 

More from Heinrich Hartmann (20)

Latency SLOs Done Right @ SREcon EMEA 2019
Latency SLOs Done Right @ SREcon EMEA 2019Latency SLOs Done Right @ SREcon EMEA 2019
Latency SLOs Done Right @ SREcon EMEA 2019
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Linux System Monitoring with eBPF
Linux System Monitoring with eBPFLinux System Monitoring with eBPF
Linux System Monitoring with eBPF
 
Statistics for Engineers
Statistics for EngineersStatistics for Engineers
Statistics for Engineers
 
Scalable Online Analytics for Monitoring
Scalable Online Analytics for MonitoringScalable Online Analytics for Monitoring
Scalable Online Analytics for Monitoring
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
Seminar on Complex Geometry
Seminar on Complex GeometrySeminar on Complex Geometry
Seminar on Complex Geometry
 
Seminar on Motivic Hall Algebras
Seminar on Motivic Hall AlgebrasSeminar on Motivic Hall Algebras
Seminar on Motivic Hall Algebras
 
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONSGROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
 
Topics in Category Theory
Topics in Category TheoryTopics in Category Theory
Topics in Category Theory
 
Related-Work.net at WeST Oberseminar
Related-Work.net at WeST OberseminarRelated-Work.net at WeST Oberseminar
Related-Work.net at WeST Oberseminar
 
Komplexe Zahlen
Komplexe ZahlenKomplexe Zahlen
Komplexe Zahlen
 
Pushforward of Differential Forms
Pushforward of Differential FormsPushforward of Differential Forms
Pushforward of Differential Forms
 
Dimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher RingeDimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher Ringe
 
Polynomproblem
PolynomproblemPolynomproblem
Polynomproblem
 
Hecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector BundlesHecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector Bundles
 
Dimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-ModulnDimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-Moduln
 
Nodale kurven und Hilbertschemata
Nodale kurven und HilbertschemataNodale kurven und Hilbertschemata
Nodale kurven und Hilbertschemata
 
Local morphisms are given by composition
Local morphisms are given by compositionLocal morphisms are given by composition
Local morphisms are given by composition
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Log Files

  • 1. Log Files by Heinrich Hartmann twitter: @HeinrichHartman web: heinrich-hartmann.net
  • 2. Where do log files come from? Use Case: Distributed Web Applications ● Web Servers ● Application Servers ● Databases ● Network infrastructure: Routers / Load balancers / Switches
  • 3. How do Log Files look like? Example Web Server Log: ● Timestamp ● Source hostname / IP ● Session ID (if available) ● Request URL ● Return code ● Reply size
  • 4. Example Log File NASA Dataset (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html) Two months worth of HTTP request to Kennedy Space Center. (22 MB zipped)
  • 5. What insights do Log Files provide? ● Monitoring Are my servers up and running as expected? How much resources are being used used? Are my business metrics ok? (earnings/h) ● Troubleshooting / Debugging Why is my system slow? Where are messages dropped? ● Reporting / Mining How do my users behave? Click stream analysis. KPI calculations (click through, bounce rate)
  • 6. Log Volume Example Calculation for Wikipedia ● 200 application servers ● 20 database servers ● 70 cache servers ● 10k http requests per second (rps) peak load (peek: 50k rps) ● 80k SQL queries per second peak load Apache Access Log rate: 10k req/sec * 100b log message = 1mb/s (peek: 5mb/s) = 86gb/day (i.e. BIG) Source: http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/ http://reportcard.wmflabs.org/ https://ganglia.wikimedia.org/
  • 7. Log File Processing Web Server Web Server Web Server Log Aggregator Log Monitor Log Analytics Business Logic. Generate Log Files. Gathers log files from individual servers and stores them on a central location Real time reports, dashboards, plots, alerts Batch processing, data mining Source: Theo Schlossnagle - Scalable Internet Architectures
  • 8. ● Local storage of log files on web servers ● Periodic “pull” aggregation of log files, via ftp or scp Drawbacks: ● No real-time access to logs. ● No log files from crashed servers. Classical Solution Web Server Web Server Web Server Log Aggregator Log Monitor Log Analytics
  • 9. ● Real-time aggregation of logs files (“push”) ● Need to use reliable transfer (syslog only provides UDP) ● Configuration management complicated, - every web server needs to know about the log aggregator - problematic if redundancy should be adedd Real-Time Unicast Web Server Web Server Web Server Log Aggregator Log Monitor Log Analytics
  • 10. Passive “Sniffing” Log aggregation Web Server Web Server Web ServerLog Aggregator ● Log files are produced by sniffing network packages ● Very accurate log files ● No interaction/configuration of server required ● Need single egress point ● Security flaw (man in the middle attack). Not compatible with SSL. Internet
  • 11. ● Real-time log distribution using group messaging (ZeroMQ/Spread/Thrift) ● Flexible communication patterns (allow multiple subscribers) ● Use reliable IP multicast to reduce network load ● Less configuration overhead (group subscriptions) Log AnalyticsLog Aggregator Best practice: Group communication Web Server Web Server Web Server Log Monitor(s)
  • 12. Further Topics Real-Time Monitoring ● Splunk ● circonus ● kibana ● storm Log batch analysis ● Map-Reduce/Hadoop ● Hive ● Drill/Dremel
  • 13. Further Reading * http://hortonworks.com/use-cases/server-logs-hadoop-example/ * http://www.slideshare.net/mapredit/apache-flume-ng * Logstash * http://www.elasticsearch.org/overview/kibana/
  • 14. Further Steps ● Chefkoch Datensatz (log files from several months) 1. Inspect data 2. Gather interesting questions to data 3. Try to answer questions using big data processing ● Stream processing vs. batch processing (Thomas) - Welche queries/operatoren können auf dem stream beantwortet werden - knowledge discovery / feature selection -> Indexing Other log file sources ● Sensor data analysis ● Profiling von software projekten (SOAMIG)