Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET

60 views

Published on

Задачи по обработке естественного языка сейчас встречаются практически в любом проекте. К сожалению, до недавнего времени, платформа .NET не сильно подходила для решения подобных задач. С выходом ML.NET ситуация стала меняться к лучшему, но все еще далека от идеала.
На этом докладе я расскажу про основные задачи, которые решаются методами Natural Language Processing и какие существуют способы решения этих задач на платформе .NET (сервисы, библиотеки, фреймворки).

Published in: Education
  • Be the first to comment

  • Be the first to like this

.NET Fest 2019. Сергей Корж. Natural Language Processing in .NET

  1. 1. Тема доклада Тема доклада Тема доклада KYIV 2019 Natural Language Processing with .NET .NET CONFERENCE #1 IN UKRAINE
  2. 2. Тема доклада Тема доклада Тема доклада .NET LEVEL UP About me .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Sergiy Korzh 25+ years in software development 20 year running own business .NET developer since 2004 iForum.ua (technology section) Projects: EasyQuery (https://korzh.com/easyquery) Easy.Report (http://easy.report) Aistant (https://aistant.com/) Twitter: @korzhs LinkedIn: https://www.linkedin.com/in/korzh/
  3. 3. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Agenda .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Introduction to NLP (main tasks and basic concepts) NLP Tools for .NET (and not only)2 3 Demos 4 Useful materials and conclusions
  4. 4. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Why NLP on .NET? .NET CONFERENCE #1 IN UKRAINE KYIV 2019
  5. 5. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Why NLP on .NET? .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Because we love .NET, right? Quick and easy (for simple NLP tasks) No “glue” code
  6. 6. Тема доклада Тема доклада Тема доклада .NET LEVEL UP Remarks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 “Light” NLP tasks only! No Deep Learning Beginner level topics
  7. 7. .NET LEVEL UP NLP Tasks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Linguistic Analysis Transformation 2 3 Generation4
  8. 8. .NET LEVEL UP NLP Tasks .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Linguistic • Segmentation • Part of speech tagging • Named-entity recognition • Relation extraction • Syntactic parsing • Coreference resolution • Semantic parsing
  9. 9. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 2 Analysis • Spam-filter • Sentiment analysis • Text similarity • Information extraction
  10. 10. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 3 Transformation • Machine translation • Speech to Text / Text to speech • Grammar correction • Text summarization
  11. 11. .NET LEVEL UP NLP Tasks’ Examples .NET CONFERENCE #1 IN UKRAINE KYIV 2019 4 Generation • Question Answering • Chat bots • Story generation
  12. 12. .NET LEVEL UP NLP Pipeline .NET CONFERENCE #1 IN UKRAINE KYIV 2019 TEXT Text Featurizing (Numeric representation) ML Algorithm RESULT
  13. 13. .NET LEVEL UP NLP Pipeline: Classic .NET CONFERENCE #1 IN UKRAINE KYIV 2019 from AYLIEN blog
  14. 14. .NET LEVEL UP NLP Pipeline: Deep Learning .NET CONFERENCE #1 IN UKRAINE KYIV 2019 from AYLIEN blog
  15. 15. .NET LEVEL UP NLP concepts: Bag of words .NET CONFERENCE #1 IN UKRAINE KYIV 2019 The way to represent your text for ML algorithms • Word frequency • One-hot encoding • TF-IDF • Other metrics Encoding approaches:
  16. 16. .NET LEVEL UP NLP concepts: TF-IDF .NET CONFERENCE #1 IN UKRAINE KYIV 2019 For a word-document pair, TF-IDF shows the importance of the word in the document. Used in all kinds of information retrieval tasks: • Search • Text mining • Stop-words filtering
  17. 17. .NET LEVEL UP NLP concepts: N-grams .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Word N-grams n-gram is a contiguous sequence of n items from a given sample of text. “I live in Kyiv” word bi-grams 1. # I 2. I live 3. live in 4. in Kyiv 5. Kyiv # Character N-grams “I live in Kyiv” character bi-grams 1. #_ 2. _I 3. I_ 4. _l 5. li 6. Iv 7. ve 8. . . .
  18. 18. .NET LEVEL UP NLP concepts: Word Embeddings .NET CONFERENCE #1 IN UKRAINE KYIV 2019 A set of techniques which allow to map words (or phrases) to numeric vectors. The words with similar meanings have “close” vectors. word Vector man [0.23, 0.56, …] king [0.34, 0.16, …] woman [0.41, 0.73, …] queen [0.09, 0.62, …] [king] – [man] + [woman] ≈ [queen] Popular embeddings algorithms:  Word2Vec  fastText  Glove  . . .
  19. 19. .NET LEVEL UP NLP concepts: Language Model .NET CONFERENCE #1 IN UKRAINE KYIV 2019 allows to compute a probability of a word in a sequence. Where used? (spoiler: almost everywhere!) Please, give me a … [ pen: 0.002, example: 0.0001, hand:0.08, … ] • Machine translation • Error correction • Speech recognition • Text generation
  20. 20. .NET LEVEL UP NLP Tools .NET CONFERENCE #1 IN UKRAINE KYIV 2019 1 Online services Python libraries .NET Libraries 2 3 Azure Cognitive Services, IBM Watson, Amazon AI Services NLTK, spaCy, skikit-learn, gensim, Pattern ML.NET, Microsoft.Speech, Microsoft.Recognizers, Catalyst
  21. 21. .NET LEVEL UP .NET libs: ML.NET .NET CONFERENCE #1 IN UKRAINE KYIV 2019 https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet Pros: • Native for .NET (Core) • Backed my Microsoft • Super performant (at least MS says that ) • Extended with TensorFlow & more NLP features: • Text normalization • Tokenizing • N-gram • Word embeddings • Stop words removal Cons: • Poor NLP features • English-only (mostly) • Not convenient for using separately from ML pipeline
  22. 22. .NET LEVEL UP .NET libs: Catalyst .NET CONFERENCE #1 IN UKRAINE KYIV 2019 NLP features: • Text normalization • Tokenizing • POS-tagging • Word embeddings • Stop words removal https://github.com/curiosity-ai/catalyst Pros: • Native for .NET (Core) • Inspired by spaCy library • Fast tokenizer • Has pretrained models • Allows to train your own models (based on Universal Dependencies project) Cons: • Early beta (or even alpha). Version 0.0.2795 • English-only (mostly)
  23. 23. .NET LEVEL UP .NET libs: Microsoft.Recognizers .NET CONFERENCE #1 IN UKRAINE KYIV 2019 • Rule-based • Recognizes numbers, units, date/time, etc • Supports about 10 different languages • Not only .NET (JavaScript, Python, Java) • No support for Russian or Ukrainian  https://github.com/Microsoft/Recognizers-Text/
  24. 24. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 1 Text summarization (extraction based) using home-brewed NLP TEXT Detect language Break into sentences Tokenize and get stems sentence1 sentence2 sentence3 stem1 1 3 5 stem2 0 2 4 stem3 3 4 0 stem4 2 0 2 Bag of words S1 S2 S3 S1 0 1.21 0.2 S2 1.21 0 3.56 S3 0.2 3.56 0 Similarity matrix Page rank algorithm Summary (top-rated sentences)
  25. 25. Other useful libraries
  26. 26. Other useful libraries
  27. 27. Other useful libraries
  28. 28. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 2 Text summarization using ML.NET
  29. 29. Other useful libraries
  30. 30. Other useful libraries
  31. 31. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 DEMO 3 Document tagging (with TF-IDF and Catalyst POS tagging)
  32. 32. Other useful libraries
  33. 33. Other useful libraries
  34. 34. Other useful libraries
  35. 35. .NET LEVEL UP Useful resources .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Universal Dependencies https://universaldependencies.org/ Lang-uk http://lang.org.ua/uk/ https://github.com/korzh/Korzh.NLP All source code of this talk Math.net – numerical computation algorithms for .NET https://www.mathdotnet.com/ http://tiny.cc/dotnet-nlp-libs List of .NET libraries with some NLP features
  36. 36. .NET LEVEL UP Conclusions .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Catalyst library looks promising but still a way to go Contribute! We can do NLP on .NET (for the basic tasks at least) ML.NET library good and reliable but limited NLP features
  37. 37. .NET LEVEL UP Other useful libraries .NET CONFERENCE #1 IN UKRAINE KYIV 2019 Thank you! Sergiy Korzh Twitter: @korzhs LinkedIn: https://www.linkedin.com/in/korzh/ Facebook: https://www.facebook.com/sergiy.korzh Email: sergiy@korzh.com

×