• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
131
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Night Owl Log Monitoring using Elasticsearch and Hadoop Boyd Meier (bmeier@pros.com) Hadoop Meetup – October 16, 2013 © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 2. Problem © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 3. Application Performance Monitoring ● Many servers ● Many applications ● Many log formats ● Many places to go look for information ● What if we could just look in one place and see everything? © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 4. Advanced Analysis ● The logs are too low-level ● The servers need the existing capacity ● The amount of data to be analyzed is huge ● Some analysis needs to be across multiple servers ● What if we want to change the analysis algorithms? ● How we can do analysis in the most flexible way possible? © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 5. Proactive Support ● See problems coming before they become crises ● Watch for errors and exceptions ● Track performance of the application ● Track usage of the application ● Enable checks we haven’t thought of yet © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 6. Some Analysis Questions ● What errors happen, and how often? ● Who did what, when? ● How long did it take to do a task? ● What else was happening on the server? © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 7. Constraints ● Very little budget – as much free stuff as possible ● Can’t use client machines ● Communications need to be secure ● Large amounts of data (Gb/day/client) ● Minimize support’s dependence on client IT © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 8. Approach © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 9. Hadoop ● We have a lot of data (~2 GB day with 3 clients) ● We need to process it in reasonable time ● We can’t afford a big machine for this ● We have lots of old machines lying around ● Sounds like a job for the elephant! But what about query? © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 10. Elasticsearch ● Query performance on base Hadoop is painful ● Ad-hoc queries are required ● Hadoop integration ● Cluster deployment ● Looks promising! How do we get the data into the server? © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 11. Logstash ● Handle many sources, not just logs ● Fan-in architecture to server ● Compressed, SSL encrypted data ● Can offload some logic on the client if desired ● Massively configurable ● Output to Elasticsearch ● Great! Now how about visualization? © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 12. Kibana ● Backed by Elasticsearch ● Supports dynamic queries ● View information over time ● Built-in support for Logstash ● Configurable, shareable dashboards © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 13. © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 14. Hadoop Processing ● Pig scripts process the data ● Wonderdog from InfoChimps to integrate Pig and Elasticsearch – There are issues: • Cluster stability using Wonderdog • Wonderdog Pig interface has not been updated in a while • Currently evaluating elasticsearch-hadoop project from Elasticsearch.org ● Analysis results are stored in Elasticsearch for ease of access © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 15. Demo © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 16. Configuration Details © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 17. © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 18. Software ● Ubuntu 12.04.2 LTS (Precise) ● Cloudera CDH 4.3.1 – Hadoop 2.0.0 – Hbase 0.94 – Hive 0.10 – Pig 0.11 ● Elasticsearch 0.90.3 ● Logstash 1.1.12 ● Kibana 3 M3 © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 19. Hardware Architecture ● 27 node cluster of commodity machines ● 42 TB of disk space ● Connected via 10 gigabit switch ● Each machine has: – 8 GB RAM – 2 TB SATA HDD – Gigabit Ethernet © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 20. Performance ● Over the month of September: – 188 million events ingested from 3 clients – 57.5 GB storage used (1.92 GB / day) ● At that rate, 42 TB is enough space for: – 142 billion events – 60 years of data from these clients – 1 year of data from 180 clients at the same volume per client © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 21. Resources ● Elasticsearch - http://www.elasticsearch.org/overview/ • http://github.com/elasticsearch/elasticsearch ● Logstash - http://www.elasticsearch.org/overview/logstash/ • https://github.com/logstash/logstash ● Kibana - http://www.elasticsearch.org/overview/kibana/ • https://github.com/elasticsearch/kibana ● ES – Hadoop - http://www.elasticsearch.org/overview/hadoop/ • http://github.com/elasticsearch/elasticsearch-hadoop ● Cloudera - http://www.cloudera.com/ © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY
  • 22. World Headquarters 3100 Main Street, Suite #900 Houston, TX 77002 Phone: +1 713-335-5151 Sales: +1 855-846-0641 Fax: +1 713-335-8144 PROS Germany GmbH Feringastrasse 6 85774 Unterfoehring Munich Tel.: +49 89 99216 270 Fax: +49 89 99216 200 European Headquarters - United Kingdom Lakeside House 1 Furzeground Way Stockley Park Heathrow UB11 1BD Phone: +44 (0) 208 622 3555 Fax: +44 208 622 3230 Regional Office - Austin, TX 3600 Parmer Lane, Suite 205 Austin, Texas 78727 Regional Office - Cary, North Carolina 1000 Centre Green Way, #200 Cary, NC 27513 Phone:+1 919-228-6334 © COPYRIGHT PROS, INC. 2013 | CONFIDENTIAL AND PROPRIETARY