Two cameras displaced horizontally from one another can obtain two differing views of a scene, similar to human binocular vision. By comparing these two images and examining the relative positions of objects, 3D information like depth can be extracted from the 2D images. This process is called stereo vision. The depth information is contained in pixel displacements between the two images called disparities, which are inversely proportional to distance. Knowing camera intrinsics and their relative pose allows reconstructing 3D point positions through triangulating corresponding image points. Reconstruction accuracy depends on factors like disparity, baseline distance, and focal length.