This document describes a study on voice conversion using sequence-to-sequence learning. The researchers propose converting context posterior probabilities from the source to target speaker using sequence-to-sequence learning to allow for variable-length conversion. They also propose jointly training the recognition and synthesis models to better relate recognition accuracy to synthesis accuracy. Experimental results found that sequence-to-sequence learning enabled variable-length conversion and joint training improved speaker similarity and quality of converted speech over conventional methods.