This document proposes a design method for efficient parallel processing in 3D standard-chip stacking systems using a standard bus. It presents a model for mapping parallel algorithms to a 3D-SCSS and describes a design flow. As an example, it maps the scale pyramid generation process of an image recognition algorithm across multiple processor chips in the 3D-SCSS. Analysis shows the independent resize approach reduces data transfer compared to iterative resize, though it requires synchronization. Estimated power consumption is a minimum of 691.2μW for data transfer at 10 frames per second.