This document provides an overview of a research talk on human-in-the-loop speech synthesis technology given by Yuki Saito from the University of Tokyo. The talk was organized in two parts, with the first part presented by Saito covering human-in-the-loop deep speaker representation learning and speaker adaptation for multi-speaker text-to-speech. Saito's research group at the University of Tokyo works on text-to-speech and voice conversion using deep learning techniques. Their recent work focuses on incorporating human listeners into the training process to learn speaker representations that better capture perceptual speaker similarity.