Efficient logging in multithreaded C++ server


  Have a look at easylogging++ a simple (single-header based) multi-threaded logging c++ lib with many features (github.com/mkhan3189/easyloggingpp)
  1. 1. 1 www.chenshuo.com EFFICIENT LOGGING IN MULTITHREADED C++ SERVER2012/06 Shuo Chen
  2. 2. Show me the code2  C++ logging library in muduo 0.5.0  http://code.google.com/p/muduo (release)  github.com/chenshuo/muduo (latest)  github.com/chenshuo/recipes/tree/master/logging  Performance on i5-2500  1,000,000+ log messages per second  Max throughput 100+MiB/s  1us~1.6us latency per message for async logging 2012/06 www.chenshuo.com
  3. 3. Two meanings of log3  Diagnostic log  Transaction log  log4j,logback, slf4j  Write-ahead log  log4cxx, log4cpp,  Binlog,redo log log4cplus, glog, g2log,  Journaling Pantheios, ezlogger  Log-structured FS  Here we mean diagnostic log/logging  Textual, human readable  grep/sed/awk friendly 2012/06 www.chenshuo.com
  4. 4. Features of common logging library4  Multiple log levels, enable/disable at runtime  TRACE, DEBUG, INFO, WARN, ERROR, FATAL  Flexibilities are not all necessary!  Appenders, layouts, filters  Many possible destinations? ie. Appender  There is one true destination, really  Configure log line format at runtime? ie. Layout  You won’t change format during life time of a project  Muduo logging is configured at compile time  No xml config file, but a few lines of code in main() 2012/06 www.chenshuo.com
  5. 5. What to log in distributed system5  Everything! All the time!  http://highscalability.com/log-everything-all-time  Log must be fast  Tensof thousands messages per sec is normal  without noticeable performance penalty  Should never block normal execution flow  Log data could be big  up to 1GB per minute, for one process on one host  An efficient logging library is a prerequisite for any non-trivial server-side program www.chenshuo.com 2012/06
  6. 6. Frontend and backend of log lib6  Frontend formats log messages  Backend sends log messages to destination (file)  The interface between could be as simple as  void output_log(const char* msg, int len)  However, in a multithreaded program, the most difficult part is neither frontend nor backend, but transfer log data from frontend to backend  Multiple producers (frontend), one consumer  Low latency / low CPU overhead for frontend  High throughput for backend 2012/06 www.chenshuo.com
  7. 7. Frontend should be easy to use7  Two styles in C++, C/Java function vs. C++ stream  printlog(“Received %d bytes from %s”, len, client);  LOG << “Received ” << len << “ bytes from ” << client;  print*() can be made type safe, but cumbersome  You can’t pass non-POD objects as (...) arguments  Pantheios uses overloaded function templates  LOG is easier to use IMO, no placeholder in fmt str  When logging level is disabled, the whole statement can be made a nop, almost no runtime overhead at all  http://www.drdobbs.com/cpp/201804215 2012/06 www.chenshuo.com
  8. 8. Why muduo::LogStream ?8  std::ostream is too slow and not thread-safe  One ostringstream object per log msg is expensive  LogStream is fast because  No formatting, no manipulators, no i18n or l10n  Output integers, doubles, pointers, strings  Fmt class is provided, though  Fixed-size buffer allocated on stack, no malloc call  Also limit the max log msg to 4000 bytes, same as glog  See benchmark result at  www.cnblogs.com/Solstice/archive/2011/07/17/2108715.html 2012/06  muduo/base/test/LogStream_bench.cc www.chenshuo.com
  9. 9. Log line format is also fixed9  You don’t want to change output format at runtime, do you?  One log message per line, easy for grep  date time(UTC) thread-id level message source 20120603 08:02:46.125770Z 23261 INFO Hello - test.cc:51 20120603 08:02:46.126926Z 23261 WARN World - test.cc:52 20120603 08:02:46.126997Z 23261 ERROR Error - test.cc:53  No overhead of parsing format string all the time  Further more, the date time string is cached in 1 sec  TRACE and DEBUG levels are disabled by default  Turn them on with environment variable www.chenshuo.com 2012/06
  10. 10. The one true destination: LogFile10  Local file, with rolling & timestamp in log filename  It is a joke to write large amount of log messages to  SMTP, Database, Network (FTP, “log server”)  The purpose of logging is to investigate what has happened in case of system failure or malfunction  Network could fail, how to log that event?  Log to network may also double bandwidth usage  Be aware of log to network mapped file system  What if disk fails? The host is not usable anyway  Check dmesg and other kernel logs 2012/06 www.chenshuo.com
  11. 11. Performance requirement11  Suppose PC server with SATA disks, no RAID  110MB/s 7200rpm single disk  Logging to local disk, 110 bytes per log message  1000k log messages per second with IO buffer  Target for a “high-performance” logging library  For a busy server with 100k qps, logging every request should not take too much CPU time  100k log msgs per second can only serve 10k qps  The log library should be able to write 100MB/s  And lasts for seconds (at peak time of course) 2012/06 www.chenshuo.com
  12. 12. Performance of muduo logging12  ~100-byte msgs, single thread, synchronous log  E5320  i5-2500  nop (msgs/s, bytes/s)  nop (msgs/s, bytes/s)  971k/s, 107.6MiB/s  2422k/s, 256.2MiB/s  /dev/null  /dev/null  912k/s, 101.1MiB/s  2342k/s, 247.7MiB/s  /tmp/log  /tmp/log  810k/s, =89.8MiB/s  2130k/s, 225.3MiB/s  1,000,000+ msgs per sec, saturate disk bandwidth 2012/06 www.chenshuo.com
  13. 13. Other trick for postmortem13  Log output must be buffered  can’t afford fflush(3)ing every time  What if program crashes, there must be some unwritten log messages in core file, how to find ?  Put a cookie (a sentry value) at beginning of message/buffer  Cookie can be an address of function, to be unique  Also set cookie to some other function in dtor  Identify messages with gdb find command 2012/06 www.chenshuo.com
  14. 14. Logging in multithreaded program14  Logging must be thread-safe (no interleaving)  And efficient  Better to log to one file per process, not per thread  Easierfor reading log, not jump around files  The OS kernel has to serialize writing anyway  Thread-safe is easy, efficient is not that easy  Global lock and blocking writing are bad ideas  One background thread gathers log messages and write them to disk. Aka. asynchronous logging. 2012/06 www.chenshuo.com
  15. 15. Asynchronous logging is a must15  Aka. Non-blocking logging  Disk IO can block for seconds, occasionally  Cause timeouts in distributed system  and cascade effects, eg. false alarms of deadlock, etc.  Absolutely no disk IO in normal execution flow  Very important for non-blocking network programming, check my other slides  We need a “queue” to pass log data efficiently  Not necessarily a traditional blocking queue,  no need to notify consumer every time there is 2012/06 something to write www.chenshuo.com
  16. 16. What if messages queue up?16  Program writes log faster than disk bandwidth?  Itqueues first in OS cache  Then in the process, memory usage increase rapidly  In case of overload, the logging library should not crash or OOM, drop messages instead  Send alerts via network if necessary  Not a problem in synchronous logging  Blocking-IO is easy/good for bandwidth throttling 2012/06 www.chenshuo.com
  17. 17. Double buffering for the queue17  Basic idea: two buffers, swap them when one is full  Allbusiness threads write to one buffer, memcpy only  Log thread writes the other buffer to disk  Improvement: four buffers, no waiting in most case  Critical code in critical sections, next two pages typedef boost::ptr_vector<LargeBuffer> BufferVector; typedef BufferVector::auto_type BufferPtr; muduo::MutexLock mutex_; muduo::Condition cond_; BufferPtr currentBuffer_; BufferPtr nextBuffer_; BufferVector buffers_; 2012/06 www.chenshuo.com
  18. 18. void AsyncLogging::append(const char* logline, int len) { muduo::MutexLockGuard lock(mutex_); if (currentBuffer_->avail() > len) { // most common case: buffer is not full, copy data here currentBuffer_->append(logline, len); } else // buffer is full, push it, and find next spare buffer { buffers_.push_back(currentBuffer_.release()); if (nextBuffer_) // is there is one already, use it { currentBuffer_ = boost::ptr_container::move(nextBuffer_); } else // allocate a new one { currentBuffer_.reset(new Buffer); // Rarely happens } currentBuffer_->append(logline, len); cond_.notify(); }18 } 2012/06 www.chenshuo.com
  19. 19. // in log thread BufferPtr newBuffer1(new Buffer); BufferPtr newBuffer2(new Buffer); boost::ptr_vector<Buffer> buffersToWrite(16); while (running_) { // swap out what need to be written, keep CS short { muduo::MutexLockGuard lock(mutex_); cond_.waitForSeconds(flushInterval_); buffers_.push_back(currentBuffer_.release()); currentBuffer_ = boost::ptr_container::move(newBuffer1); buffersToWrite.swap(buffers_); if (!nextBuffer_) { nextBuffer_ = boost::ptr_container::move(newBuffer2); } } // output buffersToWrite, re-fill newBuffer1/2 } // final note: bzero() each buffer initially to avoid page faults19 2012/06 www.chenshuo.com
  20. 20. Alternative solutions?20  Use normal muduo::BlockingQueue<string> or BoundedBlockingQueue<string> as the queue  Allocate memory for every log message, you need a good malloc that optimized for multithreading  Replace stack buffer with heap buffer in LogStream  Instead of copying data, passing pointer might be faster  But as I tested, copying data is 3x faster for small msgs ~4k  That’s why muduo only provides one AsyncLogging class  More buffers, reduce lock contention  Like ConcurrentHashMap, buckets hashed by thread id 2012/06 www.chenshuo.com
  21. 21. Conclusion21  Muduo logging library is fairly speedy  It provides the most fundamental features  No bells and whistles, no unnecessary flexibilities  Frontend: LogStream and Logging classes  LOG_INFO << "Hello"; 20120603 08:02:46.125770Z 23261 INFO Hello - test.cc:51  Backend: LogFile class, rolling & timestamped  logfile_test.20120603-144022.hostname.3605.log  For multithreaded, use AsyncLogging class  Check examples in muduo/base/tests for how to use 2012/06 www.chenshuo.com