Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Devteach 2017 Store 2 million of audit a day into elasticsearch

2,645 views

Published on

Devteach conference 2017 my talk on introduction to elasticsearch, logstash and kibana, How we use those components together as a solution. Some perf benchmarks on ElasticSearch when storing data

Published in: Software
  • Be the first to comment

  • Be the first to like this

Devteach 2017 Store 2 million of audit a day into elasticsearch

  1. 1. STORE 2 MILLION OF AUDIT LOGS A DAY INTO ELASTICSEARCH Taswar Bhatti (Microsoft MVP) GEMALTO @taswarbhatti http://taswar.zeytinsoft.co m taswar@gmail.com
  2. 2. WHO AM I? - 4 years Microsoft MVP - 17 years in software industry - Currently working as System Architect in Enterprise Security Space (Gemalto) - You may not have heard of Gemalto but 1/3 of the world population uses Gemalto they just dont know it - Gemalto has stacks build in many environnent .NET, Java, Node, Lua, Python, mobile (Android, IOS), ebanking etc 9/22/2017 2
  3. 3. AGENDA - Problem we had and wanted to solve with Elastic Stack - Intro to Elastic Stack (Ecosystem) - Logstash - Kibana - Beats - Elastic Search flows designs that we have considered - Future plans of using Elastic Search 9/22/2017 3
  4. 4. QUESTION & POLL - How many of you are using Elastic or some other logging solution? - How do you normally log? Where do you log? - Do you log in Relational Database? 9/22/2017 4
  5. 5. HOW DO YOU TROUBLESHOOT OR FIND YOUR BUGS - Typically in a distributed environment one has to go through the logs to find out where the issue is - Could be multiple systems that you have to go through which machine/server generated the log or monitoring multiple logs - Even monitor firewall logs to find traffic routing through which data center - Chuck Norris never troubleshoot; the trouble kills themselves when they see him coming 9/22/2017 5
  6. 6. 9/22/2017 6
  7. 7. OUR PROBLEM - We had distributed systems (microservices) that would generate many different types of logs, in different data centers - We also had authentication audit logs that had to be secure and stored for 1 year - We generate around 2 millions records of audit logs a day, 4TB with replications - We need to generate reports out of our data for customers - We were still using Monolith Solution in some core parts of the application - Growing pains of a successful application - We want to use a centralized scalable logging system for all our9/22/2017 7
  8. 8. FINDING BUGS THROUGH LOGS 9/22/2017 8
  9. 9. A LITTLE HISTORY OF ELASTICSEARCH - Shay Banon created Compass in 2004 - Released Elastic Search 1.0 in 2010 - ElasticSearch the company was formed in 2012 - Shay wife is still waiting for her receipe app 9/22/2017 9
  10. 10. 9/22/2017 10
  11. 11. ELASTIC STACK 9/22/2017 11
  12. 12. ELASTICSEARCH - Written in Java backed by Lucene - Schema free, REST & JSON based document store - Search Engine - Distributed, Horizontally Scalable - No database storage, storage is Lucene - Apache 2.0 License 9/22/2017 12
  13. 13. COMPANIES USING ELASTIC STACK 9/22/2017 13
  14. 14. ELASTICSEARCH INDICES - Elastic organizes document in indices - Lucene writes and maintains the index files - ElasticSearch writes and maintains metadata on top of Lucene - Example: field mappings, index settings and other cluster metadata 9/22/2017 14
  15. 15. DATABASE VS ELASTIC 9/22/2017 15
  16. 16. ELASTIC CONCEPTS - Cluster : A cluster is a collection of one or more nodes (servers) - Node : A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities - Index : An index is a collection of documents that have somewhat similar characteristics. (e.g Product, Customer, etc) - Type : Within an index, you can define one or more types. A type is a logical category/partition of your index. - Document : A document is a basic unit of information that can be indexed - Shard/Replica: Index divided into multiple pieces called shards, replicas are copy of your shards9/22/2017 16
  17. 17. ELASTIC NODES - Master Node : which controls the cluster - Data Node : Data nodes hold data and perform data related operations such as CRUD, search, and aggregations. - Ingest Node : Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing - Coordinating Node : only route requests, handle the search reduce phase, and distribute bulk indexing. 9/22/2017 17
  18. 18. SAMPLE JSON DOCUMENT HTTP CALL JSON DOCUMENT 9/22/2017 18
  19. 19. ELASTICSEARCH CLUSTER 9/22/2017 19
  20. 20. TYPICAL CLUSTER SHARD & REPLICA 9/22/2017 20
  21. 21. SHARD SEARCH AND INDEX 9/22/2017 21
  22. 22. DEMO OF ELASTICSEARCH 9/22/2017 22
  23. 23. LOGSTASH - Ruby application runs under JRuby on the JVM - Collects, parse, enrich data - Horizontally scalable - Apache 2.0 License - Large amount of public plugins written by Community https://github.com/logstash- plugins 9/22/2017 23
  24. 24. TYPICAL USAGE OF LOGSTASH 9/22/2017 24
  25. 25. 9/22/2017 25
  26. 26. LOGSTASH INPUT 9/22/2017 26
  27. 27. LOGSTASH FILTER 9/22/2017 27
  28. 28. LOGSTASH OUTPUT 9/22/2017 28
  29. 29. DEMO LOGSTASH 9/22/2017 29
  30. 30. BEATS 9/22/2017 30
  31. 31. BEATS - Lightweight shippers written in Golang (Non JVM shops can use them) - They follow unix philosophy; do one specific thing, and do it well - Filebeat : Logfile (think of it tail –f on steroids) - Metricbeat : CPU, Memory (like top), redis, mongodb usage - Packetbeat : Wireshark uses libpcap, monitoring packet http etc - Winlogbeat : Windows event logs to elastic - Dockbeat : Monitoring docker - Large community lots of other beats offered as opensource 9/22/2017 31
  32. 32. 9/22/2017 32
  33. 33. FILEBEAT 9/22/2017 33
  34. 34. X-PACK - Elastic commercial offering (This is one of the ways they make money) - X-Pack is an Elastic Stack extension that bundles - Security (https to elastic, password to access Kibana) - Alerting - Monitoring - Reporting - Graph capabilities - Machine Learning 9/22/2017 34
  35. 35. 9/22/2017 35
  36. 36. KIBANA - Visual Application for Elastic Search (JS, Angular, D3) - Powerful frontend for dashboard for visualizing index information from elastic search - Historical data to form charts, graphs etc - Realtime search for index information 9/22/2017 36
  37. 37. 9/22/2017 37
  38. 38. DEMO KIBANA 9/22/2017 38
  39. 39. DESIGNS WE WENT THROUGH - We started with simple design to measure throughput - One instance of logstash and one instance of ElasticSearch with filebeat 9/22/2017 39
  40. 40. DOTNET CORE APP - We used a dotnetcore application to generate logs - Serilog to generate into json format and stored on file - Filebeat was installed on the linux machine to ship the logs to logstash 9/22/2017 40
  41. 41. PERFORMANCE ELASTIC - 250 logs item per second for 30 minutes 9/22/2017 41
  42. 42. OVERVIEW 9/22/2017 42
  43. 43. LOGSTASH 9/22/2017 43
  44. 44. ELASTIC SEARCH RUN TWO - 1000 logs per second, run for 30 minutes 9/22/2017 44
  45. 45. PERFORMANCE 9/22/2017 45
  46. 46. OTHER DESIGNS 9/22/2017 46
  47. 47. WHAT WE ARE GOING WITH FOR NOW, UNTIL….. 9/22/2017 47
  48. 48. CONSIDERATIONS OF DATA - Index by day make sense in some cases - In other you may want to index by size rather (Black Friday more traffic than other days) when Shards are not balance ElasticSearch doesn’t like that - Don’t index everything, if you are not going to search on specific fields mark them as text 9/22/2017 48
  49. 49. FUTURE CONSIDERATIONS - Investigate into Elastic Search Machine learning - ElasticSearch with Kafka for cross data center replication 9/22/2017 49
  50. 50. THANK YOU & OPEN TO QUESTIONS - Questions??? - Contact: Taswar@gmail.com - Blog: http://Taswar.zeytinsoft.com - Twitter: @taswarbhatti - LinkedIn (find me and add me) 9/22/2017 50

×