The document discusses common multi-threaded performance issues, including a case study where a CORBA/C++ server's throughput decreased as more CPUs were added. This was due to cache consistency overhead from frequent mutex lock/unlock calls across CPUs, and unfair mutex wakeup semantics allowing long waits. The issues can be addressed by limiting thread pools, using multiple processes each bound to a CPU, and minimizing mutex usage.