The document discusses stacking complete modules in 3D microprocessor design at different granularities. It proposes stacking an additional silicon layer containing 8MB of L2 cache to increase storage with little impact on access latency compared to traditional 2D designs. Testing shows this benefits workloads with working sets that exceed the 4MB L2 cache. The document also discusses optionally using 3D stacking to provide additional functionality like accessing internal processor state or adding reliability through redundant execution.