Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fluentd at Bay Area Kubernetes Meetup

787 views

Published on

Logging for Production Systems in The Container Era
https://www.meetup.com/Bay-Area-Kubernetes-Meetup/events/235765474/

Published in: Software
  • Be the first to comment

Fluentd at Bay Area Kubernetes Meetup

  1. 1. Logging for Production Systems in The Container Era Sadayuki Furuhashi
 Founder & Software Architect Bay Area Kubernetes Meetup
  2. 2. A little about me… Sadayuki Furuhashi github: @frsyuki A founder of Treasure Data, Inc. located in Mountain View. OSS projects I founded: An open-source hacker.
  3. 3. It's like JSON. but fast and small. A little about me…
  4. 4. The Container Era Server Era Container Era Service Architecture Monolithic Microservices System Image Mutable Immutable Managed By Ops Team DevOps Team Local Data Persistent Ephemeral Log Collection syslogd / rsync ? Metrics Collection Nagios / Zabbix ?
  5. 5. Server Era Container Era Service Architecture Monolithic Microservices System Image Mutable Immutable Managed By Ops Team DevOps Team Local Data Persistent Ephemeral Log Collection syslogd / rsync ? Metrics Collection Nagios / Zabbix ? The Container Era How should log & metrics collection be done in The Container Era?
  6. 6. Problems
  7. 7. The traditional logrotate + rsync on containers Log Server Application Container A File FileFile Difficult to use!! Complex text parsers Application Container C File FileFile Application Container B File FileFile High latency!! Must wait for a day Ephemeral!! Could be lost at any time
  8. 8. Server 1 Container A Application Container B Application Server 2 Container C Application Container D Application Kafka elasticsearch HDFS Container Container Container Container Small & many containers make storages overloaded Too many connections from micro containers!
  9. 9. Server 1 Container A Application Container B Application Server 2 Container C Application Container D Application Kafka elasticsearch HDFS Container Container Container Container System images are immutable Too many connections from micro containers! Having M*N configuration
 makes hard!
  10. 10. Combination explosion with microservices
 requires too many scripts for data integration LOG script to parse data cron job for loading filtering script syslog script Tweet- fetching script aggregation script aggregation script script to parse data rsync server
  11. 11. The centralized log collection service LOG We Released!
 (Apache License)
  12. 12. What’s Fluentd? Simple core
 + Variety of plugins Buffering, HA (failover), Secondary output, etc. Like syslogd AN EXTENSIBLE & RELIABLE DATA COLLECTION TOOL
  13. 13. Real World Use Cases
  14. 14. Text logging with --log-driver=fluentd Server Container App FluentdSTDOUT / STDERR docker run --log-driver=fluentd 
 --log-opt fluentd-address=localhost:24224 { “container_id”: “ad6d5d32576a”, “container_name”: “myapp”, “source”: stdout }
  15. 15. Metrics collection with fluent-logger Server Container App Fluentd from fluent import sender from fluent import event sender.setup('app.events', host='localhost') event.Event('purchase', { 'user_id': 21, 'item_id': 321, 'value': '1' }) tag = app.events.purchase { “user_id”: 21, “item_id”: 321 “value”: 1, } fluent-logger library
  16. 16. Logging methods for each purpose • Collecting log messages > --log-driver=fluentd • Application metrics > fluent-logger • Access logs, logs from middleware > Shared data volume • System metrics (CPU usage, Disk capacity, etc.) > Fluentd’s input plugins
 (Fluentd pulls those data periodically)
  17. 17. Microsoft Operations Management Suite uses Fluentd: "The core of the agent uses an existing open source data aggregator called Fluentd. Fluentd has hundreds of existing plugins, which will make it really easy for you to add new data sources." Syslog Linux Computer Operating System Apache MySQL Containers omsconfig (DSC) PS DSC Providers OMI Server (CIM Server) omsagent Firewall/proxy OMSService Upload Data (HTTPS) Pull configuration (HTTPS)
  18. 18. Atlassian "At Atlassian, we've been impressed by Fluentd and have chosen to use it in Atlassian Cloud's logging and analytics pipeline." Kinesis Elasticsearch cluster Ingestion service
  19. 19. Deployment Patterns
  20. 20. Server 1 Container A Application Container B Application Server 2 Container C Application Container D Application Kafka elasticsearch HDFS Container Container Container Container Primitive deployment… Too many connections from many containers! Having M*N configuration
 makes hard!
  21. 21. Server 1 Container A Application Container B Application Fluentd Server 2 Container C Application Container D Application Fluentd Kafka elasticsearch HDFS Container Container Container Container destination is always localhost from app’s point of view Source aggregation decouples config from apps
  22. 22. Server 1 Container A Application Container B Application Fluentd Server 2 Container C Application Container D Application Fluentd active / standby / load balancing Destination aggregation makes storages scalable for high traffic Aggregation server(s)
  23. 23. Aggregation servers • Logging directly from microservices makes log storages overloaded. > Too many RX connections > Too frequent import API calls • Aggregation servers make the logging infrastracture more reliable and scalable. > Connection aggregation > Buffering for less frequent import API calls > Data persistency during downtime > Automatic retry at recovery from downtime
  24. 24. Example Use Cases
  25. 25. Streaming from Apache/Nginx to Elasticsearch in_tail /var/log/access.log /var/log/fluentd/buffer but_file
  26. 26. Error Handling and Recovery in_tail /var/log/access.log /var/log/fluentd/buffer but_file Buffering for any outputs Retrying automatically With exponential wait and persistence on a disk and secondary output
  27. 27. Tailing & parsing files Supported built-in formats: Read a log file Custom regexp Custom parser in Ruby • apache • apache_error • apache2 • nginx • json • csv • tsv • ltsv • syslog • multiline • none pos fileevents.log ? (your app)
  28. 28. Out to Multiple Locations Routing based on tags Copy to multiple storages buffer access.log in_tail
  29. 29. Example configuration for real time batch combo
  30. 30. Data partitioning by time on HDFS / S3 access.log buffer Custom file formatter Slice files based on time 2016-01-01/01/access.log.gz 2016-01-01/02/access.log.gz 2016-01-01/03/access.log.gz … in_tail
  31. 31. The centralized log collection service LOG We Released!
 (Apache License)

×