This paper improves the skip-gram model for learning word and phrase embeddings. It proposes using phrases instead of words as training samples, which greatly reduces the number of samples. It also introduces subsampling frequent words to speed up training and improve performance on infrequent words. Further, it compares different methods for reducing computational complexity like hierarchical softmax and negative sampling, finding negative sampling works best. Empirical tests on analogical reasoning tasks show the best model uses hierarchical softmax with subsampling and is trained on billions of words. The paper also demonstrates additive compositionality of word vectors.