Successfully reported this slideshow.
Your SlideShare is downloading. ×

Python NLTK

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Nltk
Nltk
Loading in …3
×

Check these out next

1 of 28 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Python NLTK (20)

Advertisement

Python NLTK

  1. 1. NLTK Alberts Pumpurs
  2. 2. 90% of world's data generated over last two years
  3. 3. common Internet user creates Visual Textual Instagram Flickr Vscocam Facebook Tumblr Blogger Twitter Facebook Emails Costumer Reviews
  4. 4. Detecting hidden signals
  5. 5. World is full of unstructured, text-rich data. Everything from emails to customer tweets. The information buried in all that text holds the potential to deliver valuable business insights
  6. 6. Text analytics is the practice of using technology to gather, store and mine textual information for hidden signals that can be used to inform smarter business decisions
  7. 7. An explosion of unstructured data
  8. 8. Many types of organizations are experiencing explosive growth in their unstructured enterprise data. Same time that they have access to external sources of data such as social media, blogs, and mobile data.
  9. 9. Until now, much of this information passed through the organization virtually unanalyzed. Today, new tools for handling large amounts of complex data makes it easier to squeeze value from such unlikely sources.
  10. 10. Text Processing use cases
  11. 11. sentiment analysis spam filtering text categorization topic detection keyword frequency plagiatism detection document similarity phrase extraction
  12. 12. Natural Language Tool Kit leading platform for building Python programs to work with human language data
  13. 13. NLTK Features
  14. 14. sentence and word tokenization text calsification corpora parsing clustring part of speach tagging text stemming and mutch more..
  15. 15. Sentence tokenization
  16. 16. Word tokenization
  17. 17. Part of speech tagging
  18. 18. Part of speech tagging explanation CC Coordinating conjunctin CD Cardinal Number DT Determiner EX Existing “ there“ FW Foreign word IN Preposition or subordination conjuction JJ Adjective JJR Adjective- comparative JJS Adjective- superlative LS List item marker MD Modal NN Noun- singular or mass NNS Non-Plural NP Proper noun- singular nltk.help.upenn_tagset() //all tag sets
  19. 19. Chunking and NER
  20. 20. Text clasification Algorithms in NLTK Naive Bayes Maximum Entropy Decision Tree
  21. 21. Text clasification
  22. 22. Sentiment analysis https://github.com/pumpurs/SentimentWordsLV/
  23. 23. Document similarity detection Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.
  24. 24. Similarity and concordance
  25. 25. Dispersion Plot
  26. 26. “Market and product reserch” “Social CMS” 1.97 b social network users “Costumer profiling / analytics” 70% of marketers used Facebook to gain 6.7 million people blog on blogging sites
  27. 27. pumpurs.alberts@gmail.com Big Data, Startups, Text Analysis, Internet of Things, Web Development

×