SlideShare a Scribd company logo
Building Natural Language
Processing solutions
For Davidson Machine Learning Group
By Ramu Pulipati,
@botsplash
Introduction to NLP
• Natural Language:
• General purpose communications
• Distinct difference between humans and Animals
• Much difficult to interpret from Formal Language
• Natural Language Processing (NLP) Advancements
• Earlier focus was on Linguistics and Computer Science
• Current evolution is focused on Machine Learning, specifically
Deep Learning and Neural Networks
• Varied degrees of implementation based on use case
Scope of Natural Language Processing
• Read
• Natural Language Understanding (NLU)
• Write
• Natural Language Generation (NLG)
• Speak
• Speech Recognition / Syntesis
NLP Applications
More Applications …
• Email Spam
• Siri / Alexa / Cortana
• Legal Contacts to find Action
clauses
• Health Care Records
• Energy Sector / Utilities /
Inspection Records
• Automated Agents
• Appointment Scheduling
• Auto Email Responses
• Typing Suggestions
• Spelling Check
• Predicting Crops
• Social Media Propaganda
• Press/Earnings releases
• Weather Reports
• Search Engines
• News categorization
• Chatbot
• NY Times Oped author analysis
State of NLP
Source: https://www.slideshare.net/healess/sk-t-academy-lecture-note
Botsplash AI Strategy
Machine
Learning
Natural
Language
Processing
Predictive
Analytics
Routing Intelligence
High Intent Conversion Detection
Trends and Behavior
End Chat, Spam Detection
Content and Sentiment
FAQ, Support, Transaction
Chatbot
Re-engagement
Smart Scheduling
UI Interactions
Focus on solvable/acceptable problems
I’m looking for 30yr mortgage loan in Charlotte, NC
(Named Entity Recognition)
Thanks for your help. Great chatting with you.
(classification)
Lets connect tomorrow. Anytime evening will work for me.
(classification / intent / actionable)
This rate is unacceptable. What can you do?
(sentiment)
Note on leading NLP providers
• AWS Comprehend
• Google Cloud NLP
• Microsoft Project Oxford
• IBM Watson
• Aylien
• Cennest Comparison: https://cognitiveintegratorapp.azurewebsites.net/
Note: None of them provide the results you are looking for. Open source
packages are your best options.
Text Processing Roundup
• Normalization
• Text Classification
• Text Similarity
• Text Extraction
• Topic Modeling
• Semantic Search
• Sentiment Analysis
Word Embeddings
• Paper published by Mikolov 2013
Example: Man is to Woman, then King is to _______
• Multi-dimensional space of word representations with proximity
based on similarity of the words (word vectors)
• Algebraic expressions can be applied on Word vectors
• Building Word embedding: Provide lot of data with features to look
• Word2vec is a popular word embedding implemented with Neural
network
• Other implementations such as Glove use co-occurrence matrices
Word2vec paper results
NLP Pipeline
• Classical
follows
traditional ML
strategies
• Deep Learning
requires lot of
data
Getting started
• Python Installation. Use 3+.
• Data science packages installation. Use “pip install” or Anaconda
• Always use “virtualenv” when setting up environments.
• Start with Jupyter notebooks and convert it production code.
• Use cloud hosted jupyter notebooks with access to GPU from
floydhub, paperspace, Google, Amazon or Azure
Python packages for NLP
• NLP Focus Packages
• NLTK
• Spacy
• Gensim
• Textblob
• Scikit Learn
• Stanford NLP (java)
• WordNet, SentiWordNet
• FastText / MUSE / Faiss
• Deep Learning Frameworks
• Tensorflow / Keras
• Pytorch
• Other Noteworth
• Scrapy
• Newspaper
• nlp-architect
NLTK Code Tour
• Tokenization (Dictionary and Regex)
• Stemming
• Lemma
• NLP Grammar - Chunking and Chinking
• Entity Recognition
• WikiQuiz
Spacy.io Lightning Tour
• Industrial Strength, Fast
• POS Tagging and Dependency Parsing
• Named Entities, Word embedding and Similarity
• Custom Pipelines
• Visualization
Text classification
• Use cases: Spam, Actionable events, Intents
• For Content based or Request based classification
• Steps involve Preparing -> Training -> Prediction
• Feature Extractions
• Bag of Words
• TF-IDF model
• Word Vectors: Averaged, TD-IDF, tc
• Starspace model
• FastText
• Classification alg: Multinomial Bayes or SVM
• Intent Classification
• RASA NLU
• Snips NLU
Steps to classifying your data
1. Identify tags to be applied
2. Manually add tags for the
data (possibly in the
application)
3. Build a classification
algorithm
4. Setup your application to
auto classify tags
5. Evaluate silently and then
enable the actions
Sentiment Analysis
• Use case: Reviews, Chat transcripts, etc
• Supervised techniques are effective for a domain
• Packages:
• SentiWordNet
• StanfordNLP
• Spacy Sentiment Analysis (incomplete)
Summarization
• Summarization is hard
• Uses variety of techniques including Text extraction, Feature Matrix,
TD-IDF, Co-location, SVD and other methods
• Implement LSA to under
• Review of implementations:
• Spacy
• TextRank
• Pyteaser
• Textteaser
• Sumy
Chatbots
• Rules Based
• Intent Classification
• Context and Workflow Management
• Handle Special Cases
• Generative
• Sequence to Sequence Chatbot: DeepQA demo
Code Review / Demo Apps
• Jupyter Notebooks
• NLTK Code Review
• Space Code Review
• Word2Vec Samples
• NLTK Grammar Parsing
• WikiQuiz
• Topic Modeling Code Review
• Text Similarity – Phrase Matcher API
Follow up Learning
• Websites:
• Allen AI - NLP
• Fast AI
• Malabuba
• Coursera
• Youtube
• Resources
• Sanni Oluwatoyin Yetunde
Google Slides
• Cambridge Data Science
Group presentation
• nlp.fast.ai

More Related Content

What's hot

Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Lifeng (Aaron) Han
 
EDS for JIBS
EDS for JIBSEDS for JIBS
EDS for JIBS
CliveRWright
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Lifeng (Aaron) Han
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Joaquin Delgado PhD.
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
Simon Hughes
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Lucidworks
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Lucidworks
 
EDS for IFLA
EDS for IFLAEDS for IFLA
EDS for IFLA
CliveRWright
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomics
Altuna Akalin
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
Bill Liu
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Trey Grainger
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
Sujit Pal
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
Sujit Pal
 
Transzaar - CAT Tool for Indian Languages including English Arabic
Transzaar - CAT Tool for Indian Languages including English ArabicTranszaar - CAT Tool for Indian Languages including English Arabic
Transzaar - CAT Tool for Indian Languages including English Arabic
Rashid Ahmad
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
Matīss ‎‎‎‎‎‎‎  
 
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
Oscar Peña del Rio
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 

What's hot (20)

Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
EDS for JIBS
EDS for JIBSEDS for JIBS
EDS for JIBS
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
 
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...Developing A Big Data Search Engine - Where we have gone. Where we are going:...
Developing A Big Data Search Engine - Where we have gone. Where we are going:...
 
EDS for IFLA
EDS for IFLAEDS for IFLA
EDS for IFLA
 
Data analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomicsData analysis patterns, tools and data types in genomics
Data analysis patterns, tools and data types in genomics
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
 
Transzaar - CAT Tool for Indian Languages including English Arabic
Transzaar - CAT Tool for Indian Languages including English ArabicTranszaar - CAT Tool for Indian Languages including English Arabic
Transzaar - CAT Tool for Indian Languages including English Arabic
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 

Similar to Building NLP solutions for Davidson ML Group

How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
CrowdFlower
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
Domingo Suarez Torres
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
Nathan McMinn
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
Shishir Choudhary
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
Minha Hwang
 
Taming Text
Taming TextTaming Text
Taming Text
Grant Ingersoll
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
ananth
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
vincent683379
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
ICS1020 NLP 2020
ICS1020 NLP 2020ICS1020 NLP 2020
ICS1020 NLP 2020
Vanessa Camilleri
 
subrat
 subrat subrat
subrat
ABA,BALASORE
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
WingChan46
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital Humanities
Helen Bailey
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
Yannick Pouliot
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Simon Hughes
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
hajinouha0
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
AmanBadesra1
 
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
sagarjsicg
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes
 

Similar to Building NLP solutions for Davidson ML Group (20)

How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Taming Text
Taming TextTaming Text
Taming Text
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
ICS1020 NLP 2020
ICS1020 NLP 2020ICS1020 NLP 2020
ICS1020 NLP 2020
 
subrat
 subrat subrat
subrat
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Workshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital HumanitiesWorkshop Exercise: Text Analysis Methods for Digital Humanities
Workshop Exercise: Text Analysis Methods for Digital Humanities
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
1. OBJECT ORIENTED PROGRAMMING USING JAVA - OOps Concepts.ppt
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 

More from botsplash.com

Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
botsplash.com
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
botsplash.com
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlotte
botsplash.com
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
botsplash.com
 
Chat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital MarketingChat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital Marketing
botsplash.com
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
botsplash.com
 
Data Science meets Digital Marketing
Data Science meets Digital MarketingData Science meets Digital Marketing
Data Science meets Digital Marketing
botsplash.com
 
botsplash deep dive
botsplash deep divebotsplash deep dive
botsplash deep dive
botsplash.com
 
Building Twitter bot using Python
Building Twitter bot using PythonBuilding Twitter bot using Python
Building Twitter bot using Python
botsplash.com
 
Python for data science
Python for data sciencePython for data science
Python for data science
botsplash.com
 
Live development & tools
Live development & toolsLive development & tools
Live development & tools
botsplash.com
 
AI Use Cases discussion
AI Use Cases discussionAI Use Cases discussion
AI Use Cases discussion
botsplash.com
 
Career advice for beginner software engineers
Career advice for beginner software engineersCareer advice for beginner software engineers
Career advice for beginner software engineers
botsplash.com
 
Node.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best PracticesNode.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best Practices
botsplash.com
 

More from botsplash.com (14)

Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlotte
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Chat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital MarketingChat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital Marketing
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
 
Data Science meets Digital Marketing
Data Science meets Digital MarketingData Science meets Digital Marketing
Data Science meets Digital Marketing
 
botsplash deep dive
botsplash deep divebotsplash deep dive
botsplash deep dive
 
Building Twitter bot using Python
Building Twitter bot using PythonBuilding Twitter bot using Python
Building Twitter bot using Python
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Live development & tools
Live development & toolsLive development & tools
Live development & tools
 
AI Use Cases discussion
AI Use Cases discussionAI Use Cases discussion
AI Use Cases discussion
 
Career advice for beginner software engineers
Career advice for beginner software engineersCareer advice for beginner software engineers
Career advice for beginner software engineers
 
Node.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best PracticesNode.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best Practices
 

Recently uploaded

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 

Recently uploaded (20)

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 

Building NLP solutions for Davidson ML Group

  • 1. Building Natural Language Processing solutions For Davidson Machine Learning Group By Ramu Pulipati, @botsplash
  • 2. Introduction to NLP • Natural Language: • General purpose communications • Distinct difference between humans and Animals • Much difficult to interpret from Formal Language • Natural Language Processing (NLP) Advancements • Earlier focus was on Linguistics and Computer Science • Current evolution is focused on Machine Learning, specifically Deep Learning and Neural Networks • Varied degrees of implementation based on use case
  • 3. Scope of Natural Language Processing • Read • Natural Language Understanding (NLU) • Write • Natural Language Generation (NLG) • Speak • Speech Recognition / Syntesis
  • 5. More Applications … • Email Spam • Siri / Alexa / Cortana • Legal Contacts to find Action clauses • Health Care Records • Energy Sector / Utilities / Inspection Records • Automated Agents • Appointment Scheduling • Auto Email Responses • Typing Suggestions • Spelling Check • Predicting Crops • Social Media Propaganda • Press/Earnings releases • Weather Reports • Search Engines • News categorization • Chatbot • NY Times Oped author analysis
  • 6. State of NLP Source: https://www.slideshare.net/healess/sk-t-academy-lecture-note
  • 7. Botsplash AI Strategy Machine Learning Natural Language Processing Predictive Analytics Routing Intelligence High Intent Conversion Detection Trends and Behavior End Chat, Spam Detection Content and Sentiment FAQ, Support, Transaction Chatbot Re-engagement Smart Scheduling UI Interactions
  • 8. Focus on solvable/acceptable problems I’m looking for 30yr mortgage loan in Charlotte, NC (Named Entity Recognition) Thanks for your help. Great chatting with you. (classification) Lets connect tomorrow. Anytime evening will work for me. (classification / intent / actionable) This rate is unacceptable. What can you do? (sentiment)
  • 9. Note on leading NLP providers • AWS Comprehend • Google Cloud NLP • Microsoft Project Oxford • IBM Watson • Aylien • Cennest Comparison: https://cognitiveintegratorapp.azurewebsites.net/ Note: None of them provide the results you are looking for. Open source packages are your best options.
  • 10. Text Processing Roundup • Normalization • Text Classification • Text Similarity • Text Extraction • Topic Modeling • Semantic Search • Sentiment Analysis
  • 11. Word Embeddings • Paper published by Mikolov 2013 Example: Man is to Woman, then King is to _______ • Multi-dimensional space of word representations with proximity based on similarity of the words (word vectors) • Algebraic expressions can be applied on Word vectors • Building Word embedding: Provide lot of data with features to look • Word2vec is a popular word embedding implemented with Neural network • Other implementations such as Glove use co-occurrence matrices
  • 13. NLP Pipeline • Classical follows traditional ML strategies • Deep Learning requires lot of data
  • 14. Getting started • Python Installation. Use 3+. • Data science packages installation. Use “pip install” or Anaconda • Always use “virtualenv” when setting up environments. • Start with Jupyter notebooks and convert it production code. • Use cloud hosted jupyter notebooks with access to GPU from floydhub, paperspace, Google, Amazon or Azure
  • 15. Python packages for NLP • NLP Focus Packages • NLTK • Spacy • Gensim • Textblob • Scikit Learn • Stanford NLP (java) • WordNet, SentiWordNet • FastText / MUSE / Faiss • Deep Learning Frameworks • Tensorflow / Keras • Pytorch • Other Noteworth • Scrapy • Newspaper • nlp-architect
  • 16. NLTK Code Tour • Tokenization (Dictionary and Regex) • Stemming • Lemma • NLP Grammar - Chunking and Chinking • Entity Recognition • WikiQuiz
  • 17. Spacy.io Lightning Tour • Industrial Strength, Fast • POS Tagging and Dependency Parsing • Named Entities, Word embedding and Similarity • Custom Pipelines • Visualization
  • 18. Text classification • Use cases: Spam, Actionable events, Intents • For Content based or Request based classification • Steps involve Preparing -> Training -> Prediction • Feature Extractions • Bag of Words • TF-IDF model • Word Vectors: Averaged, TD-IDF, tc • Starspace model • FastText • Classification alg: Multinomial Bayes or SVM • Intent Classification • RASA NLU • Snips NLU
  • 19. Steps to classifying your data 1. Identify tags to be applied 2. Manually add tags for the data (possibly in the application) 3. Build a classification algorithm 4. Setup your application to auto classify tags 5. Evaluate silently and then enable the actions
  • 20. Sentiment Analysis • Use case: Reviews, Chat transcripts, etc • Supervised techniques are effective for a domain • Packages: • SentiWordNet • StanfordNLP • Spacy Sentiment Analysis (incomplete)
  • 21. Summarization • Summarization is hard • Uses variety of techniques including Text extraction, Feature Matrix, TD-IDF, Co-location, SVD and other methods • Implement LSA to under • Review of implementations: • Spacy • TextRank • Pyteaser • Textteaser • Sumy
  • 22. Chatbots • Rules Based • Intent Classification • Context and Workflow Management • Handle Special Cases • Generative • Sequence to Sequence Chatbot: DeepQA demo
  • 23. Code Review / Demo Apps • Jupyter Notebooks • NLTK Code Review • Space Code Review • Word2Vec Samples • NLTK Grammar Parsing • WikiQuiz • Topic Modeling Code Review • Text Similarity – Phrase Matcher API
  • 24. Follow up Learning • Websites: • Allen AI - NLP • Fast AI • Malabuba • Coursera • Youtube • Resources • Sanni Oluwatoyin Yetunde Google Slides • Cambridge Data Science Group presentation • nlp.fast.ai

Editor's Notes

  1. Natural language is ambiguous, where formal language is precise Formal language: Programming language
  2. The botsplash framework encompasses and build on strong concepts and strategy to augment business processes to achieve best outcome for business and customers of the business botsplash is a Software-as-a-Service platform on a model of B-2-b-2-C. We want the “B”(business) to provide “C”(consumers of business) the best, easy to use and reliable technology to reduce costs , increase business transactions, efficiency and customer satisfaction.
  3. ML Strategies: * Explore data and use visualizations * Create Train and Test data * Setup training algorithm and feature * Train Model * Test the result * Rinse and Repeat until the results are satisfactory
  4. Multinomial Naïve Bayes is used to predict more than 2 classes. Popular Bayes algorithm that expects each feature is independent Support vector machine are supervised algorithms used for classification, regression, anomaly and outlier detections For classification algorithm, we focus on following metrics: accuracy, precision, recall and f1 score