Machine learning (ML) and natural language processing (NLP)

Natural language processing and
machine learning
Nikola Milosevic

What is AI?
• Intelligence presented by a machine
• Flexible agent that interacts with the environment and
performs actions to maximize success towards certain goal

What is machine learning
• Subfield of computer science that explores
how machines can learn to perform certain
task without explicit programming

Types of machine learning
• Supervised learning
• Semi-supervised learning
• Unsupervised learning
• Reinforcement learning

Machine learning problems
• Classification
• Clustering
• Regression

Testing the model
• Iteratively improve the model
• Test multiple algorithms – find the best one
• No free lunch theory
• Feedback loop for feature selection
• Konfuziona matrica
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

Examples of ML frameworks and
algorithms
• SCI-kit learn
– Python library
– Implementation of the most useful algorithms
– Naïve Bayes, SVM, Random forests, decision
trees…
• Keras
– Python library implementing about everything
related to neural networks

Text data
• About 80% of data in organizations are in text
format
• Harder to analyse than structured data
• Huge amount of textual documents
– Only in biomedicine 2200 scientific papers are
published every day
• Growing exponentially

Main goals of text mining
• Make communication easier (e.g. translation)
• Automate some processes (e.g.
communication agents/chatbots)
• Do data mining on textual and unstructured
data

Challenges
• Man saw a woman with the telescope.
– Who has a telescope?
• Multiple senses, synonyms,
homonyms, irony
• Grammar and context can help
• Acronyms

Approaches
• Rule based
– Human defined rules to extract information
– Needs expert humans who know how people express
certain things
– Is quite laborious
• Machine learning based
– Machine tries to learn what to extract guided by
human
– Needs annotated corpora (usually fairly large)
• This is expensive to create and quite laborious

Levels of analysis
• Lexical
– Analysis of words
• Syntactic
– Analysis of organization of words
(phrases, sentences)
• Semantic
– Analysis meaning
• Sometimes pragmatic
– Analysis pragmatics of the use of certain words,
phrases. Why author used that?

Lexical processing
• Part of speech tagging
• Parsing
– Constituency
– Dependency
Stanford parser

Semantic processing
• Text classification
– Sentiment analysis (positive/negative)
– Classification by topics (politics/sport/business)
– Authorship detection (Tolkien, Rowling, Shakespeare)
• Named entity recognition
• Topic modelling (unsupervised)
• Search

Sequence modelling
• Machine learning technique useful for named
entity recognition
• Conditional random fields (CRF) or recurrent
neural networks (often LSTM)

Feature engineering
• Selecting important features that help extract
information
• Can be:
– Words, PoS, word shapes, vocabulary features,
etc.
– May depend on task and methodology
– Iterative process of selecting and improving the
performance
– Some features may confuse the algorithm

Search
• Finds documents that are the most relevant
for a given user query
• Usual techniques include algorithm called TF-
IDF and cosine similarity
• May additionally use links towards text,
positions of matched words and similar things
to rank found documents
• Apache Lucene, Solr (Java), there are also
Python libraries

Language models
• Used as features to classification and other
NLP tasks
• Contain some basic characteristics of language
• The most naïve (but also frequently used) is
called Bag of Words
• NN use more advanced
models: word2vec, Glove,
ULMo, BERT…

Useful tools and libraries
• Apache OpenNLP – Java
• Apache Lucene – Java, C#
• Stanford Core NLP – Java
• NLTK – Python
• GATE – GUI alat
• SharpNLP
• ...
• Weka – for machine learning (GUI)

Machine learning (ML) and natural language processing (NLP)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine learning (ML) and natural language processing (NLP)

Similar to Machine learning (ML) and natural language processing (NLP) (20)

More from Nikola Milosevic

More from Nikola Milosevic (20)

Recently uploaded

Recently uploaded (20)

Machine learning (ML) and natural language processing (NLP)