Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Published on
Making Natural Language Processing Robust to Sociolinguistic Variation:
Natural language processing on social media text has the potential to aggregate facts and opinions from millions of people all over the world. However, language in social media is highly variable, making it more difficult to analyze that conventional news texts. Fortunately, this variation is not random; it is often linked to social properties of the author. I will describe two machine learning methods for exploiting social network structures to make natural language processing more robust to socially-linked variation. The key idea behind both methods is linguistic homophily: the tendency of socially linked individuals to use language in similar ways. This idea is captured using embeddings of node positions in social networks. By integrating node embeddings into neural networks for language analysis, we obtained customized language processing systems for individual writers — even for individuals for whom we have no labeled data. The first application shows how to apply this idea to the problem of tweet-level sentiment analysis. The second application targets the problem of linking spans of text to known entities in a knowledge base.
Be the first to comment