SlideShare a Scribd company logo
1 of 23
Building NLP solutions
using Python
By Ramu Pulipati,
@botsplash
Introduction to NLP
• Natural Language:
• General purpose communications
• Distinct difference between humans and Animals
• Much difficult to interpret from Formal Language
• Natural Language Processing (NLP) Advancements
• Earlier focus was on Linguistics and Computer Science
• Current evolution is focused on Machine Learning, specifically
Deep Learning and Neural Networks
• Varied degrees of implementation based on use case
Scope of Natural Language Processing
• Read
• Natural Language Understanding (NLU)
• Write
• Natural Language Generation (NLG)
• Speak
• Speech Recognition / Syntesis
NLP Applications
More Applications …
• Email Spam
• Siri / Alexa / Cortana
• Legal Contacts to find Action
clauses
• Health Care Records
• Energy Sector / Utilities /
Inspection Records
• Automated Agents
• Appointment Scheduling
• Auto Email Responses
• Typing Suggestions
• Spelling Check
• Predicting Crops
• Social Media Propaganda
• Press/Earnings releases
• Weather Reports
• Search Engines
• News categorization
• Chatbot
• NY Times Oped author analysis
State of NLP
Source: https://www.slideshare.net/healess/sk-t-academy-lecture-note
Botsplash AI Strategy
Machine
Learning
Natural
Language
Processing
Predictive
Analytics
Routing Intelligence
High Intent Conversion Detection
Trends and Behavior
End Chat, Spam Detection
Content and Sentiment
FAQ, Support, Transaction
Chatbot
Re-engagement
Smart Scheduling
UI Interactions
Focus on solvable/acceptable problems
I’m looking for 30yr mortgage loan in Charlotte, NC
(Named Entity Recognition)
Thanks for your help. Great chatting with you.
(classification)
Lets connect tomorrow. Anytime evening will work for me.
(classification / intent / actionable)
This rate is unacceptable. What can you do?
(sentiment)
Leading NLP Providers
• AWS Comprehend
• Google Cloud NLP
• Microsoft Project Oxford
• IBM Watson
• Aylien
• Cennest Comparison:
https://cognitiveintegratorapp.azurewebsites.net/
Text Processing Roundup
• Normalization
• Text Classification
• Text Similarity
• Text Extraction
• Topic Modeling
• Semantic Search
• Sentiment Analysis
NLP Pipeline
• Classical
follows
traditional ML
strategies
• Deep Learning
requires lot of
data
Getting started
• Python Installation. Use 3+.
• Data science packages installation. Use “pip install” or Anaconda
• Always use “virtualenv” when setting up environments.
• Start with Jupyter notebooks and convert it production code.
• Use cloud hosted jupyter notebooks with access to GPU from
floydhub, paperspace, Google, Amazon or Azure
Python packages for NLP
• NLP Focus Packages
• NLTK
• Spacy
• Gensim
• Textblob
• Scikit Learn
• Stanford NLP (java)
• WordNet, SentiWordNet
• FastText / MUSE / Faiss
• Deep Learning Frameworks
• Tensorflow / Keras
• Pytorch
• Other Noteworth
• Scrapy
• Newspaper
• nlp-architect
NLTK Code Tour
• Tokenization (Dictionary and Regex)
• Stemming
• Lemma
• NLP Grammar - Chunking and Chinking
• Entity Recognition
• WikiQuiz
Word Embeddings
• Paper published by Mikolov 2013
Example: Man is to Woman, then King is to _______
• Multi-dimensional space of word representations with proximity
based on similarity of the words (word vectors)
• Algebraic expressions can be applied on Word vectors
• Building Word embedding: Provide lot of data with features to look
• Word2vec is a popular word embedding implemented with Neural
network
• Other implementations such as Glove use co-occurrence matrices
Word2vec paper results
Spacy.io Lightning Tour
• Industrial Strength, Fast
• POS Tagging and Dependency Parsing
• Named Entities, Word embedding and Similarity
• Custom Pipelines
• Visualization
Text classification
• Use cases: Spam, Actionable events, Intents
• For Content based or Request based
classification
• Steps involve Preparing -> Training ->
Prediction
• Feature Extractions
• Bag of Words
• TD-IDF model
• Word Vectors: Averaged, TD-IDF, tc
• Starspace model
• FastText
• Classification alg: Multinomial Bayes or SVM
Steps to classifying your data
1. Identify tags to be applied
2. Manually add tags for the
data (possibly in the
application)
3. Build a classification
algorithm
4. Setup your application to
auto classify tags
5. Evaluate silently and then
enable the actions
Sentiment Analysis
• Use case: Reviews, Chat transcripts, etc
• Supervised techniques are effective for a domain
• Packages:
• SentiWordNet
• StanfordNLP
• Spacy Sentiment Analysis (incomplete)
Summarization
• Summarization is hard
• Uses variety of techniques including Text extraction, Feature Matrix,
TD-IDF, Co-location, SVD and other methods
• Implement LSA to under
• Review of implementations:
• Spacy
• TextRank
• Pyteaser
• Textteaser
• Sumy
Code Review / Demo Apps
• Jupyter Notebooks
• NLTK Code Review
• Space Code Review
• NLTK Grammar Parsing
• WikiQuiz
• Sequence to Sequence Chatbot
• DeepQA demo
• Topic Modeling Code Review
• Text Similarity – Phrase Matcher API
Follow up Learning
• Websites:
• Allen AI - NLP
• Fast AI
• Malabuba
• Coursera
• Youtube
• Resources
• Sanni Oluwatoyin Yetunde
Google Slides
• Cambridge Data Science
Group presentation
• nlp.fast.ai

More Related Content

What's hot

Build Mandarin AI Conversational Agent with Rasa
Build Mandarin AI Conversational Agent with RasaBuild Mandarin AI Conversational Agent with Rasa
Build Mandarin AI Conversational Agent with RasaHao-Yuan Chen
 
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...Lucidworks
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaAI Frontiers
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)WingChan46
 
Hacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and ProfitHacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and Profitlucenerevolution
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrLucidworks
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using SolrPradeep Pujari
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Lucidworks
 
Shrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLPShrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLPlucenerevolution
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Josh Patterson
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep LearningNatasha Latysheva
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine LearningTom Maiaroto
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
 

What's hot (20)

Build Mandarin AI Conversational Agent with Rasa
Build Mandarin AI Conversational Agent with RasaBuild Mandarin AI Conversational Agent with Rasa
Build Mandarin AI Conversational Agent with Rasa
 
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Hacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and ProfitHacking Lucene and Solr for Fun and Profit
Hacking Lucene and Solr for Fun and Profit
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Sentiment Analysis Using Solr
Sentiment Analysis Using SolrSentiment Analysis Using Solr
Sentiment Analysis Using Solr
 
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
 
Shrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLPShrinking the Haystack" using Solr and OpenNLP
Shrinking the Haystack" using Solr and OpenNLP
 
NLP from scratch
NLP from scratch NLP from scratch
NLP from scratch
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 

Similar to Building NLP solutions using Python

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewLucidworks
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and HadoopDonald Miner
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisAli BELCAID
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...Dataiku
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 

Similar to Building NLP solutions using Python (20)

Taming Text
Taming TextTaming Text
Taming Text
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and Hadoop
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
Text Mining
Text MiningText Mining
Text Mining
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Python with dataScience
Python with dataSciencePython with dataScience
Python with dataScience
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 

More from botsplash.com

Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresqlbotsplash.com
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlottebotsplash.com
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Chat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital MarketingChat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital Marketingbotsplash.com
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing optionsbotsplash.com
 
Data Science meets Digital Marketing
Data Science meets Digital MarketingData Science meets Digital Marketing
Data Science meets Digital Marketingbotsplash.com
 
Building Twitter bot using Python
Building Twitter bot using PythonBuilding Twitter bot using Python
Building Twitter bot using Pythonbotsplash.com
 
Python for data science
Python for data sciencePython for data science
Python for data sciencebotsplash.com
 
Live development & tools
Live development & toolsLive development & tools
Live development & toolsbotsplash.com
 
AI Use Cases discussion
AI Use Cases discussionAI Use Cases discussion
AI Use Cases discussionbotsplash.com
 
Career advice for beginner software engineers
Career advice for beginner software engineersCareer advice for beginner software engineers
Career advice for beginner software engineersbotsplash.com
 
Node.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best PracticesNode.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best Practicesbotsplash.com
 

More from botsplash.com (14)

Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
Devops Days, 2019 - Charlotte
Devops Days, 2019 - CharlotteDevops Days, 2019 - Charlotte
Devops Days, 2019 - Charlotte
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Chat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital MarketingChat interfaces, Extension to Digital Marketing
Chat interfaces, Extension to Digital Marketing
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
 
Data Science meets Digital Marketing
Data Science meets Digital MarketingData Science meets Digital Marketing
Data Science meets Digital Marketing
 
botsplash deep dive
botsplash deep divebotsplash deep dive
botsplash deep dive
 
Building Twitter bot using Python
Building Twitter bot using PythonBuilding Twitter bot using Python
Building Twitter bot using Python
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Live development & tools
Live development & toolsLive development & tools
Live development & tools
 
AI Use Cases discussion
AI Use Cases discussionAI Use Cases discussion
AI Use Cases discussion
 
Career advice for beginner software engineers
Career advice for beginner software engineersCareer advice for beginner software engineers
Career advice for beginner software engineers
 
Node.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best PracticesNode.js Getting Started &amd Best Practices
Node.js Getting Started &amd Best Practices
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Building NLP solutions using Python

  • 1. Building NLP solutions using Python By Ramu Pulipati, @botsplash
  • 2. Introduction to NLP • Natural Language: • General purpose communications • Distinct difference between humans and Animals • Much difficult to interpret from Formal Language • Natural Language Processing (NLP) Advancements • Earlier focus was on Linguistics and Computer Science • Current evolution is focused on Machine Learning, specifically Deep Learning and Neural Networks • Varied degrees of implementation based on use case
  • 3. Scope of Natural Language Processing • Read • Natural Language Understanding (NLU) • Write • Natural Language Generation (NLG) • Speak • Speech Recognition / Syntesis
  • 5. More Applications … • Email Spam • Siri / Alexa / Cortana • Legal Contacts to find Action clauses • Health Care Records • Energy Sector / Utilities / Inspection Records • Automated Agents • Appointment Scheduling • Auto Email Responses • Typing Suggestions • Spelling Check • Predicting Crops • Social Media Propaganda • Press/Earnings releases • Weather Reports • Search Engines • News categorization • Chatbot • NY Times Oped author analysis
  • 6. State of NLP Source: https://www.slideshare.net/healess/sk-t-academy-lecture-note
  • 7. Botsplash AI Strategy Machine Learning Natural Language Processing Predictive Analytics Routing Intelligence High Intent Conversion Detection Trends and Behavior End Chat, Spam Detection Content and Sentiment FAQ, Support, Transaction Chatbot Re-engagement Smart Scheduling UI Interactions
  • 8. Focus on solvable/acceptable problems I’m looking for 30yr mortgage loan in Charlotte, NC (Named Entity Recognition) Thanks for your help. Great chatting with you. (classification) Lets connect tomorrow. Anytime evening will work for me. (classification / intent / actionable) This rate is unacceptable. What can you do? (sentiment)
  • 9. Leading NLP Providers • AWS Comprehend • Google Cloud NLP • Microsoft Project Oxford • IBM Watson • Aylien • Cennest Comparison: https://cognitiveintegratorapp.azurewebsites.net/
  • 10. Text Processing Roundup • Normalization • Text Classification • Text Similarity • Text Extraction • Topic Modeling • Semantic Search • Sentiment Analysis
  • 11. NLP Pipeline • Classical follows traditional ML strategies • Deep Learning requires lot of data
  • 12. Getting started • Python Installation. Use 3+. • Data science packages installation. Use “pip install” or Anaconda • Always use “virtualenv” when setting up environments. • Start with Jupyter notebooks and convert it production code. • Use cloud hosted jupyter notebooks with access to GPU from floydhub, paperspace, Google, Amazon or Azure
  • 13. Python packages for NLP • NLP Focus Packages • NLTK • Spacy • Gensim • Textblob • Scikit Learn • Stanford NLP (java) • WordNet, SentiWordNet • FastText / MUSE / Faiss • Deep Learning Frameworks • Tensorflow / Keras • Pytorch • Other Noteworth • Scrapy • Newspaper • nlp-architect
  • 14. NLTK Code Tour • Tokenization (Dictionary and Regex) • Stemming • Lemma • NLP Grammar - Chunking and Chinking • Entity Recognition • WikiQuiz
  • 15. Word Embeddings • Paper published by Mikolov 2013 Example: Man is to Woman, then King is to _______ • Multi-dimensional space of word representations with proximity based on similarity of the words (word vectors) • Algebraic expressions can be applied on Word vectors • Building Word embedding: Provide lot of data with features to look • Word2vec is a popular word embedding implemented with Neural network • Other implementations such as Glove use co-occurrence matrices
  • 17. Spacy.io Lightning Tour • Industrial Strength, Fast • POS Tagging and Dependency Parsing • Named Entities, Word embedding and Similarity • Custom Pipelines • Visualization
  • 18. Text classification • Use cases: Spam, Actionable events, Intents • For Content based or Request based classification • Steps involve Preparing -> Training -> Prediction • Feature Extractions • Bag of Words • TD-IDF model • Word Vectors: Averaged, TD-IDF, tc • Starspace model • FastText • Classification alg: Multinomial Bayes or SVM
  • 19. Steps to classifying your data 1. Identify tags to be applied 2. Manually add tags for the data (possibly in the application) 3. Build a classification algorithm 4. Setup your application to auto classify tags 5. Evaluate silently and then enable the actions
  • 20. Sentiment Analysis • Use case: Reviews, Chat transcripts, etc • Supervised techniques are effective for a domain • Packages: • SentiWordNet • StanfordNLP • Spacy Sentiment Analysis (incomplete)
  • 21. Summarization • Summarization is hard • Uses variety of techniques including Text extraction, Feature Matrix, TD-IDF, Co-location, SVD and other methods • Implement LSA to under • Review of implementations: • Spacy • TextRank • Pyteaser • Textteaser • Sumy
  • 22. Code Review / Demo Apps • Jupyter Notebooks • NLTK Code Review • Space Code Review • NLTK Grammar Parsing • WikiQuiz • Sequence to Sequence Chatbot • DeepQA demo • Topic Modeling Code Review • Text Similarity – Phrase Matcher API
  • 23. Follow up Learning • Websites: • Allen AI - NLP • Fast AI • Malabuba • Coursera • Youtube • Resources • Sanni Oluwatoyin Yetunde Google Slides • Cambridge Data Science Group presentation • nlp.fast.ai

Editor's Notes

  1. Natural language is ambiguous, where formal language is precise Formal language: Programming language
  2. The botsplash framework encompasses and build on strong concepts and strategy to augment business processes to achieve best outcome for business and customers of the business botsplash is a Software-as-a-Service platform on a model of B-2-b-2-C. We want the “B”(business) to provide “C”(consumers of business) the best, easy to use and reliable technology to reduce costs , increase business transactions, efficiency and customer satisfaction.
  3. ML Strategies: * Explore data and use visualizations * Create Train and Test data * Setup training algorithm and feature * Train Model * Test the result * Rinse and Repeat until the results are satisfactory
  4. Multinomial Naïve Bayes is used to predict more than 2 classes. Popular Bayes algorithm that expects each feature is independent Support vector machine are supervised algorithms used for classification, regression, anomaly and outlier detections For classification algorithm, we focus on following metrics: accuracy, precision, recall and f1 score