Welcome!
Why NLP? 
lWe have to adopt to how computer wants data 
land we still adopt to the way computer gives back 
information. 
lNLP is helping us to make computer understand one of the 
most powerful interface to HUMANS : language. 
lApple Siri , Google Now are cutting edge examples of how 
NLP helps computer to fit humans. 
lMore details : http://www.slideshare.net/yourfrienddhruv/apps-with-ears-and-eyes
Google Now vs. Siri vs. Cortana 
https://www.stonetemple.com/great-knowledge-box-showdown/
Cutting edge NLP! 
http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/ 
https://news.ycombinator.com/item?id=8428007 
https://news.ycombinator.com/item?id=8426148
Cutting edge NLP! 
https://news.ycombinator.com/item?id=8428418 
AI Websites That Design 
Themselves 
thegrid.i 
o
NLP in today's session 
In this session we will focus more on how we 
can deal with written language in software 
products.
NLP for text analysis 
lKnowledge is fundamental requirement for any 
problem solving. 
lAn intelligent decision making system needs 3 
Major things. 
lA) Lots of relevant knowledge 
lB) A way to represent that knowledge 
corresponding to current problem/question at 
hand 
lC) A way to represent the answer in human 
language.
General Architecture of NLP systems 
lBasic systems 
l Tokenization-> [lemmatization] -> tagging -> 
chunking -> domain mapping 
l NLP systems requires per-created domain 
specific corpora (dictionary+rule set handcrafted 
by humans) 
l Details: http://www.nltk.org/book/ch05.html
General Architecture of NLP systems 
lAdvance Systems 
http://nlp.stanford.edu/software/patternslearning.shtml
Relationship to Machine Learning 
lNLP 
lAlgo and tooling are targeted to convert Text/Data to 
Values 
lML 
lAlgo and tooling are targeted to consume Values and 
produce meaningful Values/Vectors
Few popular NLP toolkits 
lPython 
lhttp://www.nltk.org 
lhttp://scikit-learn.org/ 
lhttps://textblob.readthedocs.org 
lJava 
lhttp://nlp.stanford.edu/software/index.shtml 
lhttps://gate.ac.uk/overview.html 
lhttps://opennlp.apache.org/ 
l R 
lhttp://cran.r-project. 
org/web/views/NaturalLanguageProcessing.ht 
ml
Interesting applications 
lCoverd in this session 
l1) Information summarization 
l2) Information extraction 
l3) Sentiment Analysis 
l4) Dialog based systems
1) Information summarization 
lCreates summary of big text. 
l http://summly.com/ 
lYou can create highly personalized summary of same 
content per user 
lhttp://automatedinsights.com/wordsmith/ 
lRace is on between 'plagiarism detection' and 'automatic 
paraphrasing' 
l http://copyscape.com/ 
l https://oaps.eu/project/overview/ 
l http://plagcontrol.com 
lHandy code : 
l Python and related : https://github.com/miso-belica/sumy 
l Java/Scala : https://github.com/MojoJolo/textteaser 
lBasics:
2) Information extraction 
lNamed Entity Recognition 
lCommon entity types include ORGANIZATION, 
PERSON, LOCATION, DATE, TIME, MONEY, and 
GPE (geo-political entity). 
lRelationship extraction 
lMainly between NERs 
lhttp://www.cruxbot.com/ 
lHandy code : 
lhttp://www.nltk.org/book/ch07.html 
lBasics: 
l Find interesting pair of words, and note adjoining 
words to know relationship between them.
2.1) Information Retrieval 
lLarge text needs to be search based on key words 
lTraditional RDMS indexing don't work. 
lUsing Full text search toolkits, which are good practical 
example of NLP implementation. 
lHandy Code: 
lSolar:Java 
lPostgresql:DB 
lhttp://blog.lostpropertyhq.com/postgres-full-text-search-is-good- 
enough/ 
l Basics: 
lWhile storing large text, remove non value added words (e.g 
verbs) and index only root of words.
3) Sentiment Analysis 
lTo understand overall meaning/tone of text. 
le.g. Neutral vs. Polar. Positive vs. Negative. 
lDemo 
lhttp://text-processing.com/demo/sentiment/ 
lhttp://nlp.stanford.edu:8080/sentiment/rntnDemo.html 
lUse: 
lFinding twitter tread is positive or negative? 
lFinding overall review for a product is positive or 
negative? 
lBasics: 
l Have to pick most interesting phrases and co-relate 
their meaning. 
l Correlate/Group things with similar meaning
4) Dialog based systems 
lUnderstand input given in natural language. 
lGoogle search, Siri, Google Now 
lBuilding interactive chat bots to handle customer support. 
lDetails:http://www.nltk.org/book/ch10.html 
lHandy code: 
l We can convert a question to SQL Query! 
lBasics: 
lHave English grammar mapped to another grammar for input parsing 
& vise-a-verse
Development & Testing/Verifying of NLP systems 
l1) Understand Gold Set, Training Set , Test Set 
l2) Seen vs Unseen Data 
l3) Accuracy : Precision & Recall. 
l4) Confusion Matrices
Session Summary 
l1) NLP + ML capabilities are foundation for 
intelligent systems working with / on consumer 
data. 
l2) Domain knowledge is the key differentiators 
and MAJOR cost factor 
l3) NLP system development requires different mid 
set, as its not creation but its evolution of software 
system. 
l4) Lots and Lots of academic / research reading is 
must.
What Next? Q&A? Are you sure? 
lI have an Idea which might require NLP 
lGo reach out more people: 
l@nikunjness , @yourfrienddhruv 
lI am want to know how to develop such systems 
lI think I want to research more possibilities! 
lRead this : http://www.nltk.org/book/ch01.html 
lYes, It's python. 
lI think its too complex. 
lYou are not alone.

Nautral Langauge Processing - Basics / Non Technical

  • 1.
  • 2.
    Why NLP? lWehave to adopt to how computer wants data land we still adopt to the way computer gives back information. lNLP is helping us to make computer understand one of the most powerful interface to HUMANS : language. lApple Siri , Google Now are cutting edge examples of how NLP helps computer to fit humans. lMore details : http://www.slideshare.net/yourfrienddhruv/apps-with-ears-and-eyes
  • 3.
    Google Now vs.Siri vs. Cortana https://www.stonetemple.com/great-knowledge-box-showdown/
  • 4.
    Cutting edge NLP! http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/ https://news.ycombinator.com/item?id=8428007 https://news.ycombinator.com/item?id=8426148
  • 5.
    Cutting edge NLP! https://news.ycombinator.com/item?id=8428418 AI Websites That Design Themselves thegrid.i o
  • 6.
    NLP in today'ssession In this session we will focus more on how we can deal with written language in software products.
  • 7.
    NLP for textanalysis lKnowledge is fundamental requirement for any problem solving. lAn intelligent decision making system needs 3 Major things. lA) Lots of relevant knowledge lB) A way to represent that knowledge corresponding to current problem/question at hand lC) A way to represent the answer in human language.
  • 8.
    General Architecture ofNLP systems lBasic systems l Tokenization-> [lemmatization] -> tagging -> chunking -> domain mapping l NLP systems requires per-created domain specific corpora (dictionary+rule set handcrafted by humans) l Details: http://www.nltk.org/book/ch05.html
  • 9.
    General Architecture ofNLP systems lAdvance Systems http://nlp.stanford.edu/software/patternslearning.shtml
  • 10.
    Relationship to MachineLearning lNLP lAlgo and tooling are targeted to convert Text/Data to Values lML lAlgo and tooling are targeted to consume Values and produce meaningful Values/Vectors
  • 11.
    Few popular NLPtoolkits lPython lhttp://www.nltk.org lhttp://scikit-learn.org/ lhttps://textblob.readthedocs.org lJava lhttp://nlp.stanford.edu/software/index.shtml lhttps://gate.ac.uk/overview.html lhttps://opennlp.apache.org/ l R lhttp://cran.r-project. org/web/views/NaturalLanguageProcessing.ht ml
  • 12.
    Interesting applications lCoverdin this session l1) Information summarization l2) Information extraction l3) Sentiment Analysis l4) Dialog based systems
  • 13.
    1) Information summarization lCreates summary of big text. l http://summly.com/ lYou can create highly personalized summary of same content per user lhttp://automatedinsights.com/wordsmith/ lRace is on between 'plagiarism detection' and 'automatic paraphrasing' l http://copyscape.com/ l https://oaps.eu/project/overview/ l http://plagcontrol.com lHandy code : l Python and related : https://github.com/miso-belica/sumy l Java/Scala : https://github.com/MojoJolo/textteaser lBasics:
  • 14.
    2) Information extraction lNamed Entity Recognition lCommon entity types include ORGANIZATION, PERSON, LOCATION, DATE, TIME, MONEY, and GPE (geo-political entity). lRelationship extraction lMainly between NERs lhttp://www.cruxbot.com/ lHandy code : lhttp://www.nltk.org/book/ch07.html lBasics: l Find interesting pair of words, and note adjoining words to know relationship between them.
  • 15.
    2.1) Information Retrieval lLarge text needs to be search based on key words lTraditional RDMS indexing don't work. lUsing Full text search toolkits, which are good practical example of NLP implementation. lHandy Code: lSolar:Java lPostgresql:DB lhttp://blog.lostpropertyhq.com/postgres-full-text-search-is-good- enough/ l Basics: lWhile storing large text, remove non value added words (e.g verbs) and index only root of words.
  • 16.
    3) Sentiment Analysis lTo understand overall meaning/tone of text. le.g. Neutral vs. Polar. Positive vs. Negative. lDemo lhttp://text-processing.com/demo/sentiment/ lhttp://nlp.stanford.edu:8080/sentiment/rntnDemo.html lUse: lFinding twitter tread is positive or negative? lFinding overall review for a product is positive or negative? lBasics: l Have to pick most interesting phrases and co-relate their meaning. l Correlate/Group things with similar meaning
  • 17.
    4) Dialog basedsystems lUnderstand input given in natural language. lGoogle search, Siri, Google Now lBuilding interactive chat bots to handle customer support. lDetails:http://www.nltk.org/book/ch10.html lHandy code: l We can convert a question to SQL Query! lBasics: lHave English grammar mapped to another grammar for input parsing & vise-a-verse
  • 18.
    Development & Testing/Verifyingof NLP systems l1) Understand Gold Set, Training Set , Test Set l2) Seen vs Unseen Data l3) Accuracy : Precision & Recall. l4) Confusion Matrices
  • 19.
    Session Summary l1)NLP + ML capabilities are foundation for intelligent systems working with / on consumer data. l2) Domain knowledge is the key differentiators and MAJOR cost factor l3) NLP system development requires different mid set, as its not creation but its evolution of software system. l4) Lots and Lots of academic / research reading is must.
  • 20.
    What Next? Q&A?Are you sure? lI have an Idea which might require NLP lGo reach out more people: l@nikunjness , @yourfrienddhruv lI am want to know how to develop such systems lI think I want to research more possibilities! lRead this : http://www.nltk.org/book/ch01.html lYes, It's python. lI think its too complex. lYou are not alone.