This document proposes two methods for determining the relative positions of objects detected in a scene. Method 1 simply calculates the Euclidean distance between objects and the image baseline. Method 2 computes distances using depth maps, which involves iterative depth map construction and is more computationally expensive. Both methods are used to generate a hierarchical description of objects in a scene as a tree structure. Object weights are also computed based on their position in the hierarchy, with nearer objects assigned higher weights. The first method is shown to be simpler and faster than the second method involving depth maps.