This document summarizes a research paper about reference bias in monolingual machine translation evaluation. The paper presents experiments on a Chinese-English machine translation dataset from news articles. The first experiment showed that translations were rated higher when fewer reference translations were provided for comparison. The second experiment found that translations were rated lower when references were from a different domain than the translations. The conclusions are that the number and domain of reference translations can influence evaluation scores and introduce bias.