Each box represents a three dimensional feature map (W, H, and D). The depth (D) of each box is mentioned on top and the x-y sizes (W, H) on the bottom. The blue box represents the feature map copied from the encoder step and is concatenated with the feature map (black box) generated by upsampling the previous layer. The input to the network is a 3-channel RGB image and the output is a 1-channel grayscale image. The arrows represent different operations. "New methods of removing debris and high-throughput counting of cyst nematode eggs extracted from field soil" https://doi.org/10.1371/journal.pone.0223386.g002