SlideShare a Scribd company logo
1 of 14
Log Files
by Heinrich Hartmann
twitter: @HeinrichHartman
web: heinrich-hartmann.net
Where do log files come from?
Use Case: Distributed Web Applications
● Web Servers
● Application Servers
● Databases
● Network infrastructure:
Routers / Load balancers / Switches
How do Log Files look like?
Example Web Server Log:
● Timestamp
● Source hostname / IP
● Session ID (if available)
● Request URL
● Return code
● Reply size
Example Log File
NASA Dataset (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html)
Two months worth of HTTP request to Kennedy Space Center. (22 MB zipped)
What insights do Log Files provide?
● Monitoring
Are my servers up and running as expected?
How much resources are being used used?
Are my business metrics ok? (earnings/h)
● Troubleshooting / Debugging
Why is my system slow?
Where are messages dropped?
● Reporting / Mining
How do my users behave?
Click stream analysis. KPI calculations (click through, bounce rate)
Log Volume
Example Calculation for Wikipedia
● 200 application servers
● 20 database servers
● 70 cache servers
● 10k http requests per second (rps) peak load (peek: 50k rps)
● 80k SQL queries per second peak load
Apache Access Log rate:
10k req/sec * 100b log message = 1mb/s (peek: 5mb/s)
= 86gb/day (i.e. BIG)
Source:
http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/
http://reportcard.wmflabs.org/
https://ganglia.wikimedia.org/
Log File Processing
Web Server
Web Server
Web Server
Log Aggregator
Log Monitor
Log Analytics
Business Logic.
Generate Log Files.
Gathers log files from
individual servers and
stores them on a
central location
Real time reports,
dashboards, plots,
alerts
Batch processing,
data mining
Source: Theo Schlossnagle - Scalable Internet Architectures
● Local storage of log files on web servers
● Periodic “pull” aggregation of log files, via ftp or scp
Drawbacks:
● No real-time access to logs.
● No log files from crashed servers.
Classical Solution
Web Server
Web Server
Web Server
Log Aggregator
Log Monitor
Log Analytics
● Real-time aggregation of logs files (“push”)
● Need to use reliable transfer (syslog only provides UDP)
● Configuration management complicated,
- every web server needs to know about the log aggregator
- problematic if redundancy should be adedd
Real-Time Unicast
Web Server
Web Server
Web Server
Log Aggregator
Log Monitor
Log Analytics
Passive “Sniffing” Log aggregation
Web Server
Web Server
Web ServerLog Aggregator
● Log files are produced by sniffing network packages
● Very accurate log files
● No interaction/configuration of server required
● Need single egress point
● Security flaw (man in the middle attack). Not compatible with SSL.
Internet
● Real-time log distribution using group messaging (ZeroMQ/Spread/Thrift)
● Flexible communication patterns (allow multiple subscribers)
● Use reliable IP multicast to reduce network load
● Less configuration overhead (group subscriptions)
Log AnalyticsLog Aggregator
Best practice: Group communication
Web Server
Web Server
Web Server Log Monitor(s)
Further Topics
Real-Time Monitoring
● Splunk
● circonus
● kibana
● storm
Log batch analysis
● Map-Reduce/Hadoop
● Hive
● Drill/Dremel
Further Reading
* http://hortonworks.com/use-cases/server-logs-hadoop-example/
* http://www.slideshare.net/mapredit/apache-flume-ng
* Logstash
* http://www.elasticsearch.org/overview/kibana/
Further Steps
● Chefkoch Datensatz (log files from several months)
1. Inspect data
2. Gather interesting questions to data
3. Try to answer questions using big data processing
● Stream processing vs. batch processing (Thomas)
- Welche queries/operatoren können auf dem stream beantwortet werden
- knowledge discovery / feature selection -> Indexing
Other log file sources
● Sensor data analysis
● Profiling von software projekten (SOAMIG)

More Related Content

What's hot

Live data collection_from_windows_system
Live data collection_from_windows_systemLive data collection_from_windows_system
Live data collection_from_windows_system
Maceni Muse
 

What's hot (20)

Firewalls
FirewallsFirewalls
Firewalls
 
Logging, monitoring and auditing
Logging, monitoring and auditingLogging, monitoring and auditing
Logging, monitoring and auditing
 
Live data collection_from_windows_system
Live data collection_from_windows_systemLive data collection_from_windows_system
Live data collection_from_windows_system
 
Cisco Security Presentation
Cisco Security PresentationCisco Security Presentation
Cisco Security Presentation
 
Ch 11: Hacking Wireless Networks
Ch 11: Hacking Wireless NetworksCh 11: Hacking Wireless Networks
Ch 11: Hacking Wireless Networks
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Best practises for log management
Best practises for log managementBest practises for log management
Best practises for log management
 
LDAP - Lightweight Directory Access Protocol
LDAP - Lightweight Directory Access ProtocolLDAP - Lightweight Directory Access Protocol
LDAP - Lightweight Directory Access Protocol
 
Honeynet architecture
Honeynet architectureHoneynet architecture
Honeynet architecture
 
Penetration testing & Ethical Hacking
Penetration testing & Ethical HackingPenetration testing & Ethical Hacking
Penetration testing & Ethical Hacking
 
Network monitoring tools
Network monitoring toolsNetwork monitoring tools
Network monitoring tools
 
Vulnerability assessment and penetration testing
Vulnerability assessment and penetration testingVulnerability assessment and penetration testing
Vulnerability assessment and penetration testing
 
Incident response-in-the-cloud
Incident response-in-the-cloudIncident response-in-the-cloud
Incident response-in-the-cloud
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 
CCNA Security 02- fundamentals of network security
CCNA Security 02-  fundamentals of network securityCCNA Security 02-  fundamentals of network security
CCNA Security 02- fundamentals of network security
 
Windows Live Forensics 101
Windows Live Forensics 101Windows Live Forensics 101
Windows Live Forensics 101
 
IPS (intrusion prevention system)
IPS (intrusion prevention system)IPS (intrusion prevention system)
IPS (intrusion prevention system)
 
Understanding the Event Log
Understanding the Event LogUnderstanding the Event Log
Understanding the Event Log
 
Web Services Hacking and Security
Web Services Hacking and SecurityWeb Services Hacking and Security
Web Services Hacking and Security
 
The Ldap Protocol
The Ldap ProtocolThe Ldap Protocol
The Ldap Protocol
 

Similar to Log Files

Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
Tim Bell
 

Similar to Log Files (20)

Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...
Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Log aggregation and analysis
Log aggregation and analysisLog aggregation and analysis
Log aggregation and analysis
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Observability tips for HAProxy
Observability tips for HAProxyObservability tips for HAProxy
Observability tips for HAProxy
 
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
ECS19 - Ingo Gegenwarth -  Running Exchangein large environmentECS19 - Ingo Gegenwarth -  Running Exchangein large environment
ECS19 - Ingo Gegenwarth - Running Exchange in large environment
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Ceilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summitCeilometer lsf-intergration-openstack-summit
Ceilometer lsf-intergration-openstack-summit
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
Web performance mercadolibre - ECI 2013
Web performance   mercadolibre - ECI 2013Web performance   mercadolibre - ECI 2013
Web performance mercadolibre - ECI 2013
 
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...
 

More from Heinrich Hartmann

GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONSGROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
Heinrich Hartmann
 

More from Heinrich Hartmann (20)

Latency SLOs Done Right @ SREcon EMEA 2019
Latency SLOs Done Right @ SREcon EMEA 2019Latency SLOs Done Right @ SREcon EMEA 2019
Latency SLOs Done Right @ SREcon EMEA 2019
 
Circonus: Design failures - A Case Study
Circonus: Design failures - A Case StudyCirconus: Design failures - A Case Study
Circonus: Design failures - A Case Study
 
Linux System Monitoring with eBPF
Linux System Monitoring with eBPFLinux System Monitoring with eBPF
Linux System Monitoring with eBPF
 
Statistics for Engineers
Statistics for EngineersStatistics for Engineers
Statistics for Engineers
 
Scalable Online Analytics for Monitoring
Scalable Online Analytics for MonitoringScalable Online Analytics for Monitoring
Scalable Online Analytics for Monitoring
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
Geometric Aspects of LSA
Geometric Aspects of LSAGeometric Aspects of LSA
Geometric Aspects of LSA
 
Seminar on Complex Geometry
Seminar on Complex GeometrySeminar on Complex Geometry
Seminar on Complex Geometry
 
Seminar on Motivic Hall Algebras
Seminar on Motivic Hall AlgebrasSeminar on Motivic Hall Algebras
Seminar on Motivic Hall Algebras
 
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONSGROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
GROUPOIDS, LOCAL SYSTEMS AND DIFFERENTIAL EQUATIONS
 
Topics in Category Theory
Topics in Category TheoryTopics in Category Theory
Topics in Category Theory
 
Related-Work.net at WeST Oberseminar
Related-Work.net at WeST OberseminarRelated-Work.net at WeST Oberseminar
Related-Work.net at WeST Oberseminar
 
Komplexe Zahlen
Komplexe ZahlenKomplexe Zahlen
Komplexe Zahlen
 
Pushforward of Differential Forms
Pushforward of Differential FormsPushforward of Differential Forms
Pushforward of Differential Forms
 
Dimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher RingeDimensionstheorie Noetherscher Ringe
Dimensionstheorie Noetherscher Ringe
 
Polynomproblem
PolynomproblemPolynomproblem
Polynomproblem
 
Hecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector BundlesHecke Curves and Moduli spcaes of Vector Bundles
Hecke Curves and Moduli spcaes of Vector Bundles
 
Dimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-ModulnDimension und Multiplizität von D-Moduln
Dimension und Multiplizität von D-Moduln
 
Nodale kurven und Hilbertschemata
Nodale kurven und HilbertschemataNodale kurven und Hilbertschemata
Nodale kurven und Hilbertschemata
 
Local morphisms are given by composition
Local morphisms are given by compositionLocal morphisms are given by composition
Local morphisms are given by composition
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Log Files

  • 1. Log Files by Heinrich Hartmann twitter: @HeinrichHartman web: heinrich-hartmann.net
  • 2. Where do log files come from? Use Case: Distributed Web Applications ● Web Servers ● Application Servers ● Databases ● Network infrastructure: Routers / Load balancers / Switches
  • 3. How do Log Files look like? Example Web Server Log: ● Timestamp ● Source hostname / IP ● Session ID (if available) ● Request URL ● Return code ● Reply size
  • 4. Example Log File NASA Dataset (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html) Two months worth of HTTP request to Kennedy Space Center. (22 MB zipped)
  • 5. What insights do Log Files provide? ● Monitoring Are my servers up and running as expected? How much resources are being used used? Are my business metrics ok? (earnings/h) ● Troubleshooting / Debugging Why is my system slow? Where are messages dropped? ● Reporting / Mining How do my users behave? Click stream analysis. KPI calculations (click through, bounce rate)
  • 6. Log Volume Example Calculation for Wikipedia ● 200 application servers ● 20 database servers ● 70 cache servers ● 10k http requests per second (rps) peak load (peek: 50k rps) ● 80k SQL queries per second peak load Apache Access Log rate: 10k req/sec * 100b log message = 1mb/s (peek: 5mb/s) = 86gb/day (i.e. BIG) Source: http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/ http://reportcard.wmflabs.org/ https://ganglia.wikimedia.org/
  • 7. Log File Processing Web Server Web Server Web Server Log Aggregator Log Monitor Log Analytics Business Logic. Generate Log Files. Gathers log files from individual servers and stores them on a central location Real time reports, dashboards, plots, alerts Batch processing, data mining Source: Theo Schlossnagle - Scalable Internet Architectures
  • 8. ● Local storage of log files on web servers ● Periodic “pull” aggregation of log files, via ftp or scp Drawbacks: ● No real-time access to logs. ● No log files from crashed servers. Classical Solution Web Server Web Server Web Server Log Aggregator Log Monitor Log Analytics
  • 9. ● Real-time aggregation of logs files (“push”) ● Need to use reliable transfer (syslog only provides UDP) ● Configuration management complicated, - every web server needs to know about the log aggregator - problematic if redundancy should be adedd Real-Time Unicast Web Server Web Server Web Server Log Aggregator Log Monitor Log Analytics
  • 10. Passive “Sniffing” Log aggregation Web Server Web Server Web ServerLog Aggregator ● Log files are produced by sniffing network packages ● Very accurate log files ● No interaction/configuration of server required ● Need single egress point ● Security flaw (man in the middle attack). Not compatible with SSL. Internet
  • 11. ● Real-time log distribution using group messaging (ZeroMQ/Spread/Thrift) ● Flexible communication patterns (allow multiple subscribers) ● Use reliable IP multicast to reduce network load ● Less configuration overhead (group subscriptions) Log AnalyticsLog Aggregator Best practice: Group communication Web Server Web Server Web Server Log Monitor(s)
  • 12. Further Topics Real-Time Monitoring ● Splunk ● circonus ● kibana ● storm Log batch analysis ● Map-Reduce/Hadoop ● Hive ● Drill/Dremel
  • 13. Further Reading * http://hortonworks.com/use-cases/server-logs-hadoop-example/ * http://www.slideshare.net/mapredit/apache-flume-ng * Logstash * http://www.elasticsearch.org/overview/kibana/
  • 14. Further Steps ● Chefkoch Datensatz (log files from several months) 1. Inspect data 2. Gather interesting questions to data 3. Try to answer questions using big data processing ● Stream processing vs. batch processing (Thomas) - Welche queries/operatoren können auf dem stream beantwortet werden - knowledge discovery / feature selection -> Indexing Other log file sources ● Sensor data analysis ● Profiling von software projekten (SOAMIG)