The document discusses various data sources for linguistic analysis, including corpora, dictionaries, social media, and linked open data. It provides details on accessing data from Facebook and Twitter using APIs and R packages. It also covers preprocessing text data through tokenization, lemmatization, stemming and creating term-document matrices. Sentiment analysis on data from sources like Experience Project is demonstrated through exploring word-category correlations.