SlideShare a Scribd company logo
1 of 37
Тема доклада
Тема доклада
Тема доклада
KYIV 2019
Natural Language Processing with .NET
.NET CONFERENCE #1 IN UKRAINE
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
About me
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Sergiy Korzh
25+ years in software development
20 year running own business
.NET developer since 2004
iForum.ua (technology section)
Projects:
EasyQuery (https://korzh.com/easyquery)
Easy.Report (http://easy.report)
Aistant (https://aistant.com/)
Twitter: @korzhs
LinkedIn: https://www.linkedin.com/in/korzh/
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Agenda
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Introduction to NLP (main tasks and basic concepts)
NLP Tools for .NET (and not only)2
3 Demos
4 Useful materials and conclusions
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Why NLP on .NET?
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Why NLP on .NET?
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Because we love .NET, right?
Quick and easy (for simple NLP tasks)
No “glue” code
Тема доклада
Тема доклада
Тема доклада
.NET LEVEL UP
Remarks
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
“Light” NLP tasks only!
No Deep Learning
Beginner level topics
.NET LEVEL UP
NLP Tasks
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Linguistic
Analysis
Transformation
2
3
Generation4
.NET LEVEL UP
NLP Tasks
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Linguistic
• Segmentation
• Part of speech tagging
• Named-entity recognition
• Relation extraction
• Syntactic parsing
• Coreference resolution
• Semantic parsing
.NET LEVEL UP
NLP Tasks’ Examples
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
2 Analysis
• Spam-filter
• Sentiment analysis
• Text similarity
• Information extraction
.NET LEVEL UP
NLP Tasks’ Examples
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
3 Transformation
• Machine translation
• Speech to Text / Text to speech
• Grammar correction
• Text summarization
.NET LEVEL UP
NLP Tasks’ Examples
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
4 Generation
• Question Answering
• Chat bots
• Story generation
.NET LEVEL UP
NLP Pipeline
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
TEXT Text Featurizing
(Numeric representation)
ML Algorithm RESULT
.NET LEVEL UP
NLP Pipeline: Classic
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
from AYLIEN blog
.NET LEVEL UP
NLP Pipeline: Deep Learning
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
from AYLIEN blog
.NET LEVEL UP
NLP concepts: Bag of words
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
The way to represent your text for ML algorithms
• Word frequency
• One-hot encoding
• TF-IDF
• Other metrics
Encoding approaches:
.NET LEVEL UP
NLP concepts: TF-IDF
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
For a word-document pair, TF-IDF shows the
importance of the word in the document.
Used in all kinds of information retrieval tasks:
• Search
• Text mining
• Stop-words filtering
.NET LEVEL UP
NLP concepts: N-grams
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Word N-grams
n-gram is a contiguous sequence of n items from a given sample of text.
“I live in Kyiv” word bi-grams
1. # I
2. I live
3. live in
4. in Kyiv
5. Kyiv #
Character N-grams
“I live in Kyiv” character bi-grams
1. #_
2. _I
3. I_
4. _l
5. li
6. Iv
7. ve
8. . . .
.NET LEVEL UP
NLP concepts: Word Embeddings
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
A set of techniques which allow to map words (or phrases) to numeric vectors.
The words with similar meanings have “close” vectors.
word Vector
man [0.23, 0.56, …]
king [0.34, 0.16, …]
woman [0.41, 0.73, …]
queen [0.09, 0.62, …]
[king] – [man] + [woman] ≈ [queen]
Popular embeddings algorithms:
 Word2Vec
 fastText
 Glove
 . . .
.NET LEVEL UP
NLP concepts: Language Model
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
allows to compute a probability of a word in a sequence.
Where used? (spoiler: almost everywhere!)
Please, give me a … [ pen: 0.002, example: 0.0001, hand:0.08, … ]
• Machine translation
• Error correction
• Speech recognition
• Text generation
.NET LEVEL UP
NLP Tools
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
1 Online services
Python libraries
.NET Libraries
2
3
Azure Cognitive Services, IBM Watson, Amazon AI Services
NLTK, spaCy, skikit-learn,
gensim, Pattern
ML.NET, Microsoft.Speech,
Microsoft.Recognizers, Catalyst
.NET LEVEL UP
.NET libs: ML.NET
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet
Pros:
• Native for .NET (Core)
• Backed my Microsoft
• Super performant (at least MS says that )
• Extended with TensorFlow & more
NLP features:
• Text normalization
• Tokenizing
• N-gram
• Word embeddings
• Stop words removal Cons:
• Poor NLP features
• English-only (mostly)
• Not convenient for using separately from ML pipeline
.NET LEVEL UP
.NET libs: Catalyst
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
NLP features:
• Text normalization
• Tokenizing
• POS-tagging
• Word embeddings
• Stop words removal
https://github.com/curiosity-ai/catalyst
Pros:
• Native for .NET (Core)
• Inspired by spaCy library
• Fast tokenizer
• Has pretrained models
• Allows to train your own models
(based on Universal Dependencies project)
Cons:
• Early beta (or even alpha). Version 0.0.2795
• English-only (mostly)
.NET LEVEL UP
.NET libs: Microsoft.Recognizers
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
• Rule-based
• Recognizes numbers, units, date/time, etc
• Supports about 10 different languages
• Not only .NET (JavaScript, Python, Java)
• No support for Russian or Ukrainian 
https://github.com/Microsoft/Recognizers-Text/
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
DEMO 1
Text summarization (extraction based) using home-brewed NLP
TEXT
Detect
language
Break into
sentences
Tokenize
and
get stems
sentence1 sentence2 sentence3
stem1 1 3 5
stem2 0 2 4
stem3 3 4 0
stem4 2 0 2
Bag of words
S1 S2 S3
S1 0 1.21 0.2
S2 1.21 0 3.56
S3 0.2 3.56 0
Similarity matrix
Page rank
algorithm
Summary
(top-rated
sentences)
Other useful libraries
Other useful libraries
Other useful libraries
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
DEMO 2
Text summarization using ML.NET
Other useful libraries
Other useful libraries
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
DEMO 3
Document tagging
(with TF-IDF and Catalyst POS tagging)
Other useful libraries
Other useful libraries
Other useful libraries
.NET LEVEL UP
Useful resources
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Universal Dependencies
https://universaldependencies.org/
Lang-uk
http://lang.org.ua/uk/
https://github.com/korzh/Korzh.NLP
All source code of this talk
Math.net – numerical computation algorithms for .NET
https://www.mathdotnet.com/
http://tiny.cc/dotnet-nlp-libs
List of .NET libraries with some NLP features
.NET LEVEL UP
Conclusions
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Catalyst library
looks promising but still a way to go
Contribute!
We can do NLP on .NET
(for the basic tasks at least)
ML.NET library
good and reliable but limited NLP features
.NET LEVEL UP
Other useful libraries
.NET CONFERENCE #1 IN UKRAINE KYIV 2019
Thank you!
Sergiy Korzh
Twitter: @korzhs
LinkedIn: https://www.linkedin.com/in/korzh/
Facebook: https://www.facebook.com/sergiy.korzh
Email: sergiy@korzh.com

More Related Content

What's hot

What's hot (20)

Engaging new l10n contributors through Open Source Contributhon
Engaging new l10n contributors through Open Source ContributhonEngaging new l10n contributors through Open Source Contributhon
Engaging new l10n contributors through Open Source Contributhon
 
Agile Tools for PHP
Agile Tools for PHPAgile Tools for PHP
Agile Tools for PHP
 
Agile Localization: Oxymoron or Heroic Achievement?
Agile Localization: Oxymoron or Heroic Achievement?Agile Localization: Oxymoron or Heroic Achievement?
Agile Localization: Oxymoron or Heroic Achievement?
 
[INNOVATUBE] Tech Talk #3: Golang - Vũ Nguyễn
[INNOVATUBE] Tech Talk #3: Golang - Vũ Nguyễn [INNOVATUBE] Tech Talk #3: Golang - Vũ Nguyễn
[INNOVATUBE] Tech Talk #3: Golang - Vũ Nguyễn
 
.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой спос...
.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой спос....NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой спос...
.NET Fest 2019. Алексей Голуб. Монадные парсер-комбинаторы в C# (простой спос...
 
Kotlin & Arrow the functional way
Kotlin & Arrow the functional wayKotlin & Arrow the functional way
Kotlin & Arrow the functional way
 
Creating multillingual apps for android
Creating multillingual apps for androidCreating multillingual apps for android
Creating multillingual apps for android
 
APIdays Paris 2014 - The State of Web API Languages
APIdays Paris 2014 - The State of Web API LanguagesAPIdays Paris 2014 - The State of Web API Languages
APIdays Paris 2014 - The State of Web API Languages
 
Introduction to Algorithms and Data Structures in Swift 4: Get ready for prog...
Introduction to Algorithms and Data Structures in Swift 4: Get ready for prog...Introduction to Algorithms and Data Structures in Swift 4: Get ready for prog...
Introduction to Algorithms and Data Structures in Swift 4: Get ready for prog...
 
Building LibreOffice Korean Community and CJK common & different issues
Building LibreOffice Korean Community and CJK common & different issuesBuilding LibreOffice Korean Community and CJK common & different issues
Building LibreOffice Korean Community and CJK common & different issues
 
Towards a Commons RDF Java library
Towards a Commons RDF Java libraryTowards a Commons RDF Java library
Towards a Commons RDF Java library
 
A First Look at Google's Go Programming Language
A First Look at Google's Go Programming LanguageA First Look at Google's Go Programming Language
A First Look at Google's Go Programming Language
 
Death to project documentation with eXtreme Programming
Death to project documentation with eXtreme ProgrammingDeath to project documentation with eXtreme Programming
Death to project documentation with eXtreme Programming
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
 
Automating boring and repetitive UbuCon Asia video and subtitle stuffs
Automating boring and repetitive UbuCon Asia video and subtitle stuffsAutomating boring and repetitive UbuCon Asia video and subtitle stuffs
Automating boring and repetitive UbuCon Asia video and subtitle stuffs
 
Introduction to OmegaT
Introduction to OmegaTIntroduction to OmegaT
Introduction to OmegaT
 
Kotlin strives for Deep Learning
Kotlin strives for Deep LearningKotlin strives for Deep Learning
Kotlin strives for Deep Learning
 
The Go programming language - Intro by MyLittleAdventure
The Go programming language - Intro by MyLittleAdventureThe Go programming language - Intro by MyLittleAdventure
The Go programming language - Intro by MyLittleAdventure
 
How to Review your Translation with 2 Free and Open Source QA Tools
How to Review your Translation with 2 Free and Open Source QA ToolsHow to Review your Translation with 2 Free and Open Source QA Tools
How to Review your Translation with 2 Free and Open Source QA Tools
 

Similar to .NET Fest 2019. Сергей Корж. Natural Language Processing in .NET

.NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь...
.NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь....NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь...
.NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь...
NETFest
 

Similar to .NET Fest 2019. Сергей Корж. Natural Language Processing in .NET (20)

.NET Fest 2019. Оля Гавриш. .NET Core 3.0 и будущее .NET
.NET Fest 2019. Оля Гавриш. .NET Core 3.0 и будущее .NET.NET Fest 2019. Оля Гавриш. .NET Core 3.0 и будущее .NET
.NET Fest 2019. Оля Гавриш. .NET Core 3.0 и будущее .NET
 
.NET Fest 2018. Оля Гавриш. Что нового в .NET Core 3.0
.NET Fest 2018. Оля Гавриш. Что нового в .NET Core 3.0.NET Fest 2018. Оля Гавриш. Что нового в .NET Core 3.0
.NET Fest 2018. Оля Гавриш. Что нового в .NET Core 3.0
 
.NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь...
.NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь....NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь...
.NET Fest 2018. Оля Гавриш. Машинное обучение для .NET разработчиков с помощь...
 
Net fest final presentation
Net fest final presentationNet fest final presentation
Net fest final presentation
 
.NET Fest 2019. Alexandre Malavasi. The future of Web: what Microsoft Blazor ...
.NET Fest 2019. Alexandre Malavasi. The future of Web: what Microsoft Blazor ....NET Fest 2019. Alexandre Malavasi. The future of Web: what Microsoft Blazor ...
.NET Fest 2019. Alexandre Malavasi. The future of Web: what Microsoft Blazor ...
 
.NET Fest 2019. Dan Patrascu-Baba. Microservices from the trenches. When buzz...
.NET Fest 2019. Dan Patrascu-Baba. Microservices from the trenches. When buzz....NET Fest 2019. Dan Patrascu-Baba. Microservices from the trenches. When buzz...
.NET Fest 2019. Dan Patrascu-Baba. Microservices from the trenches. When buzz...
 
.NET Fest 2019. Андрей Антиликаторов. Проектирование и разработка Big Data ре...
.NET Fest 2019. Андрей Антиликаторов. Проектирование и разработка Big Data ре....NET Fest 2019. Андрей Антиликаторов. Проектирование и разработка Big Data ре...
.NET Fest 2019. Андрей Антиликаторов. Проектирование и разработка Big Data ре...
 
Sudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdfSudipta_Mukherjee_Resume-Nov_2022.pdf
Sudipta_Mukherjee_Resume-Nov_2022.pdf
 
Programming Languages Trends for 2023
Programming Languages Trends for 2023Programming Languages Trends for 2023
Programming Languages Trends for 2023
 
Revamping Mailjet API documentation @ ParisAPI meetup
Revamping Mailjet API documentation @ ParisAPI meetupRevamping Mailjet API documentation @ ParisAPI meetup
Revamping Mailjet API documentation @ ParisAPI meetup
 
C#: Past, Present and Future
C#: Past, Present and FutureC#: Past, Present and Future
C#: Past, Present and Future
 
Mini .net conf 2020
Mini .net conf 2020Mini .net conf 2020
Mini .net conf 2020
 
Sudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdfSudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdf
 
The Ring programming language version 1.8 book - Part 6 of 202
The Ring programming language version 1.8 book - Part 6 of 202The Ring programming language version 1.8 book - Part 6 of 202
The Ring programming language version 1.8 book - Part 6 of 202
 
DOT NET TRaining
DOT NET TRainingDOT NET TRaining
DOT NET TRaining
 
Rcs project Training Bangalore
Rcs project Training BangaloreRcs project Training Bangalore
Rcs project Training Bangalore
 
The Ring programming language version 1.9 book - Part 6 of 210
The Ring programming language version 1.9 book - Part 6 of 210The Ring programming language version 1.9 book - Part 6 of 210
The Ring programming language version 1.9 book - Part 6 of 210
 
Asp.net c# mvc Training-Day-5 of Day-9
Asp.net c# mvc Training-Day-5 of Day-9Asp.net c# mvc Training-Day-5 of Day-9
Asp.net c# mvc Training-Day-5 of Day-9
 
Introduction to ASP.NET 5
Introduction to ASP.NET 5Introduction to ASP.NET 5
Introduction to ASP.NET 5
 
.Net overview
.Net overview.Net overview
.Net overview
 

More from NETFest

More from NETFest (20)

.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
 
.NET Fest 2019. Сергей Калинец. Efficient Microservice Communication with .NE...
.NET Fest 2019. Сергей Калинец. Efficient Microservice Communication with .NE....NET Fest 2019. Сергей Калинец. Efficient Microservice Communication with .NE...
.NET Fest 2019. Сергей Калинец. Efficient Microservice Communication with .NE...
 
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
.NET Fest 2019. Оля Гавриш. Машинное обучение для .NET программистов
 
.NET Fest 2019. Roberto Freato. Provisioning Azure PaaS fluently with Managem...
.NET Fest 2019. Roberto Freato. Provisioning Azure PaaS fluently with Managem....NET Fest 2019. Roberto Freato. Provisioning Azure PaaS fluently with Managem...
.NET Fest 2019. Roberto Freato. Provisioning Azure PaaS fluently with Managem...
 
.NET Fest 2019. Halil Ibrahim Kalkan. Implementing Domain Driven Design
.NET Fest 2019. Halil Ibrahim Kalkan. Implementing Domain Driven Design.NET Fest 2019. Halil Ibrahim Kalkan. Implementing Domain Driven Design
.NET Fest 2019. Halil Ibrahim Kalkan. Implementing Domain Driven Design
 
.NET Fest 2019. Сергій Бута. Feature Toggles: Dynamic Configuration at Wirex
.NET Fest 2019. Сергій Бута. Feature Toggles: Dynamic Configuration at Wirex.NET Fest 2019. Сергій Бута. Feature Toggles: Dynamic Configuration at Wirex
.NET Fest 2019. Сергій Бута. Feature Toggles: Dynamic Configuration at Wirex
 
.NET Fest 2019. Michael Staib. Hot Chocolate: GraphQL Schema Stitching with A...
.NET Fest 2019. Michael Staib. Hot Chocolate: GraphQL Schema Stitching with A....NET Fest 2019. Michael Staib. Hot Chocolate: GraphQL Schema Stitching with A...
.NET Fest 2019. Michael Staib. Hot Chocolate: GraphQL Schema Stitching with A...
 
.NET Fest 2019. Андрей Литвинов. Async lifetime tests with xUnit and AutoFixture
.NET Fest 2019. Андрей Литвинов. Async lifetime tests with xUnit and AutoFixture.NET Fest 2019. Андрей Литвинов. Async lifetime tests with xUnit and AutoFixture
.NET Fest 2019. Андрей Литвинов. Async lifetime tests with xUnit and AutoFixture
 
.NET Fest 2019. Анатолий Колесник. Love, Death & F# Tests
.NET Fest 2019. Анатолий Колесник. Love, Death & F# Tests.NET Fest 2019. Анатолий Колесник. Love, Death & F# Tests
.NET Fest 2019. Анатолий Колесник. Love, Death & F# Tests
 
.NET Fest 2019. Roberto Freato. Azure App Service deep dive
.NET Fest 2019. Roberto Freato. Azure App Service deep dive.NET Fest 2019. Roberto Freato. Azure App Service deep dive
.NET Fest 2019. Roberto Freato. Azure App Service deep dive
 
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
 
.NET Fest 2019. Александр Демчук. How to measure relationships within the Com...
.NET Fest 2019. Александр Демчук. How to measure relationships within the Com....NET Fest 2019. Александр Демчук. How to measure relationships within the Com...
.NET Fest 2019. Александр Демчук. How to measure relationships within the Com...
 
.NET Fest 2019. Anna Melashkina та Philipp Bauknecht. Dragons in a Mixed Real...
.NET Fest 2019. Anna Melashkina та Philipp Bauknecht. Dragons in a Mixed Real....NET Fest 2019. Anna Melashkina та Philipp Bauknecht. Dragons in a Mixed Real...
.NET Fest 2019. Anna Melashkina та Philipp Bauknecht. Dragons in a Mixed Real...
 
.NET Fest 2019. Alex Thissen. Architecting .NET solutions in a Docker ecosystem
.NET Fest 2019. Alex Thissen. Architecting .NET solutions in a Docker ecosystem.NET Fest 2019. Alex Thissen. Architecting .NET solutions in a Docker ecosystem
.NET Fest 2019. Alex Thissen. Architecting .NET solutions in a Docker ecosystem
 
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ....NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
.NET Fest 2019. Stas Lebedenko. Practical serverless use cases in Azure with ...
 
.NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali...
.NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali....NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali...
.NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali...
 
.NET Fest 2019. Eran Stiller. Create Your Own Serverless PKI with .NET & Azur...
.NET Fest 2019. Eran Stiller. Create Your Own Serverless PKI with .NET & Azur....NET Fest 2019. Eran Stiller. Create Your Own Serverless PKI with .NET & Azur...
.NET Fest 2019. Eran Stiller. Create Your Own Serverless PKI with .NET & Azur...
 
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith....NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
 
.NET Fest 2019. Kevin Dockx. Uncovering Swagger/OpenAPI
.NET Fest 2019. Kevin Dockx. Uncovering Swagger/OpenAPI.NET Fest 2019. Kevin Dockx. Uncovering Swagger/OpenAPI
.NET Fest 2019. Kevin Dockx. Uncovering Swagger/OpenAPI
 
.NET Fest 2019. Kevin Dockx. OpenID Connect In Depth
.NET Fest 2019. Kevin Dockx. OpenID Connect In Depth.NET Fest 2019. Kevin Dockx. OpenID Connect In Depth
.NET Fest 2019. Kevin Dockx. OpenID Connect In Depth
 

Recently uploaded

Recently uploaded (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET

  • 1. Тема доклада Тема доклада Тема доклада KYIV 2019 Natural Language Processing with .NET .NET CONFERENCE #1 IN UKRAINE
  • 2. Тема доклада Тема доклада Тема доклада .NET LEVEL UP About me .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Sergiy Korzh 25+ years in software development 20 year running own business .NET developer since 2004 iForum.ua (technology section) Projects: EasyQuery (https://korzh.com/easyquery) Easy.Report (http://easy.report) Aistant (https://aistant.com/) Twitter: @korzhs LinkedIn: https://www.linkedin.com/in/korzh/
  • 3. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Agenda .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Introduction to NLP (main tasks and basic concepts) NLP Tools for .NET (and not only)2 3 Demos 4 Useful materials and conclusions
  • 4. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Why NLP on .NET? .NET CONFERENCE #1 IN UKRAINE KYIV 2019
  • 5. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Why NLP on .NET? .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Because we love .NET, right? Quick and easy (for simple NLP tasks) No “glue” code
  • 6. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Remarks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 “Light” NLP tasks only! No Deep Learning Beginner level topics
  • 7. .NET LEVEL UP NLP Tasks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Linguistic Analysis Transformation 2 3 Generation4
  • 8. .NET LEVEL UP NLP Tasks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Linguistic • Segmentation • Part of speech tagging • Named-entity recognition • Relation extraction • Syntactic parsing • Coreference resolution • Semantic parsing
  • 9. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 2 Analysis • Spam-filter • Sentiment analysis • Text similarity • Information extraction
  • 10. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 3 Transformation • Machine translation • Speech to Text / Text to speech • Grammar correction • Text summarization
  • 11. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 4 Generation • Question Answering • Chat bots • Story generation
  • 12. .NET LEVEL UP NLP Pipeline .NET CONFERENCE #1 IN UKRAINE KYIV 2019 TEXT Text Featurizing (Numeric representation) ML Algorithm RESULT
  • 13. .NET LEVEL UP NLP Pipeline: Classic .NET CONFERENCE #1 IN UKRAINE KYIV 2019 from AYLIEN blog
  • 14. .NET LEVEL UP NLP Pipeline: Deep Learning .NET CONFERENCE #1 IN UKRAINE KYIV 2019 from AYLIEN blog
  • 15. .NET LEVEL UP NLP concepts: Bag of words .NET CONFERENCE #1 IN UKRAINE KYIV 2019 The way to represent your text for ML algorithms • Word frequency • One-hot encoding • TF-IDF • Other metrics Encoding approaches:
  • 16. .NET LEVEL UP NLP concepts: TF-IDF .NET CONFERENCE #1 IN UKRAINE KYIV 2019 For a word-document pair, TF-IDF shows the importance of the word in the document. Used in all kinds of information retrieval tasks: • Search • Text mining • Stop-words filtering
  • 17. .NET LEVEL UP NLP concepts: N-grams .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Word N-grams n-gram is a contiguous sequence of n items from a given sample of text. “I live in Kyiv” word bi-grams 1. # I 2. I live 3. live in 4. in Kyiv 5. Kyiv # Character N-grams “I live in Kyiv” character bi-grams 1. #_ 2. _I 3. I_ 4. _l 5. li 6. Iv 7. ve 8. . . .
  • 18. .NET LEVEL UP NLP concepts: Word Embeddings .NET CONFERENCE #1 IN UKRAINE KYIV 2019 A set of techniques which allow to map words (or phrases) to numeric vectors. The words with similar meanings have “close” vectors. word Vector man [0.23, 0.56, …] king [0.34, 0.16, …] woman [0.41, 0.73, …] queen [0.09, 0.62, …] [king] – [man] + [woman] ≈ [queen] Popular embeddings algorithms:  Word2Vec  fastText  Glove  . . .
  • 19. .NET LEVEL UP NLP concepts: Language Model .NET CONFERENCE #1 IN UKRAINE KYIV 2019 allows to compute a probability of a word in a sequence. Where used? (spoiler: almost everywhere!) Please, give me a … [ pen: 0.002, example: 0.0001, hand:0.08, … ] • Machine translation • Error correction • Speech recognition • Text generation
  • 20. .NET LEVEL UP NLP Tools .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Online services Python libraries .NET Libraries 2 3 Azure Cognitive Services, IBM Watson, Amazon AI Services NLTK, spaCy, skikit-learn, gensim, Pattern ML.NET, Microsoft.Speech, Microsoft.Recognizers, Catalyst
  • 21. .NET LEVEL UP .NET libs: ML.NET .NET CONFERENCE #1 IN UKRAINE KYIV 2019 https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet Pros: • Native for .NET (Core) • Backed my Microsoft • Super performant (at least MS says that ) • Extended with TensorFlow & more NLP features: • Text normalization • Tokenizing • N-gram • Word embeddings • Stop words removal Cons: • Poor NLP features • English-only (mostly) • Not convenient for using separately from ML pipeline
  • 22. .NET LEVEL UP .NET libs: Catalyst .NET CONFERENCE #1 IN UKRAINE KYIV 2019 NLP features: • Text normalization • Tokenizing • POS-tagging • Word embeddings • Stop words removal https://github.com/curiosity-ai/catalyst Pros: • Native for .NET (Core) • Inspired by spaCy library • Fast tokenizer • Has pretrained models • Allows to train your own models (based on Universal Dependencies project) Cons: • Early beta (or even alpha). Version 0.0.2795 • English-only (mostly)
  • 23. .NET LEVEL UP .NET libs: Microsoft.Recognizers .NET CONFERENCE #1 IN UKRAINE KYIV 2019 • Rule-based • Recognizes numbers, units, date/time, etc • Supports about 10 different languages • Not only .NET (JavaScript, Python, Java) • No support for Russian or Ukrainian  https://github.com/Microsoft/Recognizers-Text/
  • 24. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 1 Text summarization (extraction based) using home-brewed NLP TEXT Detect language Break into sentences Tokenize and get stems sentence1 sentence2 sentence3 stem1 1 3 5 stem2 0 2 4 stem3 3 4 0 stem4 2 0 2 Bag of words S1 S2 S3 S1 0 1.21 0.2 S2 1.21 0 3.56 S3 0.2 3.56 0 Similarity matrix Page rank algorithm Summary (top-rated sentences)
  • 28. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 2 Text summarization using ML.NET
  • 31. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 3 Document tagging (with TF-IDF and Catalyst POS tagging)
  • 35. .NET LEVEL UP Useful resources .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Universal Dependencies https://universaldependencies.org/ Lang-uk http://lang.org.ua/uk/ https://github.com/korzh/Korzh.NLP All source code of this talk Math.net – numerical computation algorithms for .NET https://www.mathdotnet.com/ http://tiny.cc/dotnet-nlp-libs List of .NET libraries with some NLP features
  • 36. .NET LEVEL UP Conclusions .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Catalyst library looks promising but still a way to go Contribute! We can do NLP on .NET (for the basic tasks at least) ML.NET library good and reliable but limited NLP features
  • 37. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Thank you! Sergiy Korzh Twitter: @korzhs LinkedIn: https://www.linkedin.com/in/korzh/ Facebook: https://www.facebook.com/sergiy.korzh Email: sergiy@korzh.com

Editor's Notes

  1. What kind of normalization? How to get tokens? What n-gramming is supported (word, character?) What kind of word embeddings? Only English? How to add my own stop-word removal?