SlideShare a Scribd company logo
Intro to KotlinNLP:
An Open-Source Library for Natural
Language Processing
https://github.com/KotlinNLP
“Reinvent the wheel if it helps you sleep at night.”
- KotlinNLP Authors
OUTLINE
▪ Why Kotlin?
▪ What do we mean by NLP?
▪ KotlinNLP Library Overview (with a look at the code on GitHub!)
▪ Repositories / General Architecture
▪ Some more details on SimpleDNN (Machine Learning) and NeuralParser (NLP)
▪ Questions?
1. Over the past 24 months, the adoption and the use of
the Kotlin programming language among developers has
seen tremendous growth.
https://pusher.com/state-of-kotlin
WHY ?
2. We think that there is a space for a library written entirely in Kotlin
dedicated to Machine Learning and Natural Language Processing in
particular. Other experiments are not widely established yet (e.g. Komputation).
3. We like Kotlin, a lot!
Language
Identification / Text
Tokenization
Structural Analysis
Morphological Analysis
Part of Speech Tagging
Dependency Parsing
Named Entity
Recognition /
Date/Time-
Recognition
Semantic Parsing
Anaphora Resolution
Semantic Role Labeling
Intent Detection
Entity Linking
(Geoparsing)
Text Categorization
/ Profiling
Topic Detection /
Sentiment Analysis
WHAT DO WE MEAN BY NLP?
We believe you don't need to be
a Deep Learning expert to work
with state-of-the-art Natural
Language Processing techniques.
https://github.com/KotlinNLP
MOZILLA PUBLIC LICENSE
VERSION 2.0
SimpleDNN
SimpleDNN is a machine learning
lightweight open-source library
designed to support relevant neural
network architectures in natural
language processing tasks.
LinguisticDescription
LinguisticDescription is a Kotlin
library designed to support linguistic
annotations over morphological,
syntactic and semantic levels of
complex languages.
LanguageDetector
LanguageDetector is a very simple to
use text language detector which
uses the Hierarchical Attention
Networks (HAN) from the
SimpleDNN library.
NeuralTokenizer
NeuralTokenizer is a very simple to
use text tokenizer and sentence
splitter which uses neural networks
from the SimpleDNN library.
NeuralParser
NeuralParser is a very simple to use
dependency parser, based on the
SimpleDNN library and the
SyntaxDecoder transition systems
framework.
TokensEncoder
TokensEncoder is a neural processor
that transform a sentence in a dense
encoded representation.
MorphologicalAnalyzer
MorphologicalAnalyzer is a Kotlin
library designed to support
morphological analysis of a text,
including enclitics, multi-words,
numbers and time-expressions.
GeoLocation
GeoLocation is a Kotlin library
designed to support the
identification of geo-locations in a
text.
Others
CONNLIO, DependencyTree, Utils, …
https://github.com/KotlinNLP
MOZILLA PUBLIC LICENSE
VERSION 2.0
SimpleDNN
SimpleDNN is a machine learning
lightweight open-source library
designed to support relevant neural
network architectures in natural
language processing tasks.
LinguisticDescription
LinguisticDescription is a Kotlin
library designed to support linguistic
annotations over morphological,
syntactic and semantic levels of
complex languages.
LanguageDetector
LanguageDetector is a very simple to
use text language detector which
uses the Hierarchical Attention
Networks (HAN) from the
SimpleDNN library.
NeuralTokenizer
NeuralTokenizer is a very simple to
use text tokenizer and sentence
splitter which uses neural networks
from the SimpleDNN library.
NeuralParser
NeuralParser is a very simple to use
dependency parser, based on the
SimpleDNN library and the
SyntaxDecoder transition systems
framework.
TokensEncoder
TokensEncoder is a neural processor
that transform a sentence in a dense
encoded representation.
MorphologicalAnalyzer
MorphologicalAnalyzer is a Kotlin
library designed to support
morphological analysis of a text,
including enclitics, multi-words,
numbers and time-expressions.
GeoLocation
GeoLocation is a Kotlin library
designed to support the
identification of geo-locations in a
text.
Others
CONNLIO, DependencyTree, Utils, …
https://github.com/KotlinNLP
MOZILLA PUBLIC LICENSE
VERSION 2.0
Mathematical operations within the SimpleDNN
library are performed by the CPU with jblas. GPU
support still missing, we need your help!
> 1.000 Unit Test with Spek
SimpleDNN
SimpleDNN is a machine learning
lightweight open-source library
designed to support relevant neural
network architectures in natural
language processing tasks.
LinguisticDescription
LinguisticDescription is a Kotlin
library designed to support linguistic
annotations over morphological,
syntactic and semantic levels of
complex languages.
LanguageDetector
LanguageDetector is a very simple to
use text language detector which
uses the Hierarchical Attention
Networks (HAN) from the
SimpleDNN library.
NeuralTokenizer
NeuralTokenizer is a very simple to
use text tokenizer and sentence
splitter which uses neural networks
from the SimpleDNN library.
NeuralParser
NeuralParser is a very simple to use
dependency parser, based on the
SimpleDNN library and the
SyntaxDecoder transition systems
framework.
TokensEncoder
TokensEncoder is a neural processor
that transform a sentence in a dense
encoded representation.
MorphologicalAnalyzer
MorphologicalAnalyzer is a Kotlin
library designed to support
morphological analysis of a text,
including enclitics, multi-words,
numbers and time-expressions.
GeoLocation
GeoLocation is a Kotlin library
designed to support the
identification of geo-locations in a
text.
Others
CONNLIO, DependencyTree, Utils, …
https://github.com/KotlinNLP
MOZILLA PUBLIC LICENSE
VERSION 2.0
LinguisticDescription is shared by all the linguistic
processors: it contains all the structures for
describing linguistic phenomena, from tokens to
morpho-syntactic structures.
Compatible with morphological dictionaries for the Italian
language available from the LINDAT repository
https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2630
SimpleDNN
Three main building blocks
Optimizer
NeuralProcessor
NeuralModel
SimpleDNN
Neural Model
Basically, a neural model is a serializable
class containing a list of trainable
patermeters (updatable arrays).
SimpleDNN
Neural Processor
▪ fun forward(input: InputType): OutputType
▪ fun backward(outputErrors: ErrorsType)
▪ fun getInputErrors(): InputErrorsType
▪ fun getParamsErrors(): ParamsErrorsType
SimpleDNN
Optimizer
▪ fun accumulate(paramsErrors: ParamsErrorsType)
▪ fun update()
NeuralProcessor
Feedforward
Recurrent
BidirectionalRecurrent
Attentive
Pointer
TreeEncoder
Layer
Feed-
Forward
Simple
Highway
Merge
Concat
Affine
Sum
Average
Recurrent
Simple
LSTM
GRU
RAN
CFN
DeltaRNN
IndRNN
SimpleDNN / Core Structures
Array
Augmented
Activable
Updatable
Dense
Sparse
Activation
ELU
ReLU
Sigmoid
Softmax
SoftSign
Tanh
UpdateMethod
AdaGrad
ADAM
LearningRate
Momentum
(+Nesterov)
RMSProp
https://github.com/KotlinNLP/SimpleD
NN/tree/dev/src/main/kotlin/com/kotli
nnlp/simplednn/deeplearning/birnn
TokensEncoder : NeuralProcessor
Word/POS
Embeddings
Character Embeddings
BiLSTM Attention
Morpho Ensemble
TokensEncoder
S T U D Y
Image from
https://web.stanford.e
du/~tdozat/files/TDoz
at-CoNLL2017-
Paper.pdf
NeuralParser
Transition-based (SyntaxDecoder)
Greedy
Decoding
Beam
Decoding
Graph-based
(TODO)
LHR
NeuralParser
TransitionSystem
ArcStandard
(Nivre, 2004)
ArcEager (Nivre,
2003)
ArcHybrid
(Kuhlmann, 2011)
ArcSwift
(Qi, 2017)
ArcSpine
(Sartorio, 2013)
ArcEagerSpine
(Grella, 2014)
EasyFirst
(Goldberg, 2010)
Covington (2001)
ArcDistance
(Attardi, 2006)
Image from https://nlp.stanford.edu/pubs/qi2017arcswift.pdf
ArcsStack Buffer
Transition-based Parsing
SYNTAX DECODER
Generalized Transition-based Parsing Framework
score(transition, state) = weight · features(transition, state)
State
DependencyTree
Oracle
FeaturesExtractor
(+Trainable)
Transition
Action
ActionsGenerator
ActionsScorer
(+Trainable)
BestActionSelector
TransitionGenerator
Features
StateItems: Input Tokens, Relevance, Errors
SYNTAX DECODER / Architecture
GoldDependencyTree
Transition-based Parsing ☺
• Built-in support for greedy, non-
monotonic and multi-threaded beam
decoding.
• Training can be made using static,
dynamic or non-derministic oracles.
LHRParser
Latent Heads Representation (Grella and Cangialosi, 2018)
A state-of-the-art neural dependency
parser that implements a novel
approach based on a bidirectional
recurrent autoencoder to perform
globally optimized non-projective
parsing via semi-supervised learning.
https://arxiv.org/abs/1802.02116
Latent Heads Representation Parser
No structured prediction,
full neural parsing!
THANK YOU! QUESTIONS?
Do you like it? Add a on GitHub!
https://github.com/KotlinNLP

More Related Content

Similar to Intro to KotlinNLP

Rclex: A Library for Robotics meet Elixir
Rclex: A Library for Robotics meet ElixirRclex: A Library for Robotics meet Elixir
Rclex: A Library for Robotics meet ElixirHideki Takase
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge
 
Authoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntaxAuthoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntaxMauro Dragoni
 
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonlexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonPretaLLOD
 
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)Mohammed Anzil
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711STIinnsbruck
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingdhruv_chaudhari
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpRikki Wright
 
A powerful comparison of deep learning frameworks for Arabic sentiment analysis
A powerful comparison of deep learning frameworks for Arabic sentiment analysis A powerful comparison of deep learning frameworks for Arabic sentiment analysis
A powerful comparison of deep learning frameworks for Arabic sentiment analysis IJECEIAES
 
CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013Rubén Izquierdo Beviá
 
Introduction_to_Python.pptx
Introduction_to_Python.pptxIntroduction_to_Python.pptx
Introduction_to_Python.pptxVinay Chowdary
 
Python Applications by The Knowledge Academy.docx
Python Applications by The Knowledge Academy.docxPython Applications by The Knowledge Academy.docx
Python Applications by The Knowledge Academy.docxAbhinavSharma309481
 
PYTHION IN DETAIL INFORMATION EDUCATIONAL
PYTHION IN DETAIL INFORMATION EDUCATIONALPYTHION IN DETAIL INFORMATION EDUCATIONAL
PYTHION IN DETAIL INFORMATION EDUCATIONALauramarketings
 
PYTHON IN DETAIL INFORMATION EDUCATIONAL
PYTHON IN DETAIL INFORMATION EDUCATIONALPYTHON IN DETAIL INFORMATION EDUCATIONAL
PYTHON IN DETAIL INFORMATION EDUCATIONALauramarketings
 
Introduction to Python.pptx
Introduction to Python.pptxIntroduction to Python.pptx
Introduction to Python.pptxSamyakJain461
 
plone.app.multilingual
plone.app.multilingual plone.app.multilingual
plone.app.multilingual Ramon Navarro
 

Similar to Intro to KotlinNLP (20)

Rclex: A Library for Robotics meet Elixir
Rclex: A Library for Robotics meet ElixirRclex: A Library for Robotics meet Elixir
Rclex: A Library for Robotics meet Elixir
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
Authoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntaxAuthoring OWL 2 ontologies with the TEX-OWL syntax
Authoring OWL 2 ontologies with the TEX-OWL syntax
 
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemonlexicog: Overview of the New Module for Lexicography of OntoLex-lemon
lexicog: Overview of the New Module for Lexicography of OntoLex-lemon
 
Features of python.pptx
Features of python.pptxFeatures of python.pptx
Features of python.pptx
 
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
A Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And RlbpA Strong Object Recognition Using Lbp, Ltp And Rlbp
A Strong Object Recognition Using Lbp, Ltp And Rlbp
 
A powerful comparison of deep learning frameworks for Arabic sentiment analysis
A powerful comparison of deep learning frameworks for Arabic sentiment analysis A powerful comparison of deep learning frameworks for Arabic sentiment analysis
A powerful comparison of deep learning frameworks for Arabic sentiment analysis
 
CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013
 
Introduction_to_Python.pptx
Introduction_to_Python.pptxIntroduction_to_Python.pptx
Introduction_to_Python.pptx
 
Python Applications by The Knowledge Academy.docx
Python Applications by The Knowledge Academy.docxPython Applications by The Knowledge Academy.docx
Python Applications by The Knowledge Academy.docx
 
CLTL Software and Web Services
CLTL Software and Web Services CLTL Software and Web Services
CLTL Software and Web Services
 
PYTHION IN DETAIL INFORMATION EDUCATIONAL
PYTHION IN DETAIL INFORMATION EDUCATIONALPYTHION IN DETAIL INFORMATION EDUCATIONAL
PYTHION IN DETAIL INFORMATION EDUCATIONAL
 
PYTHON IN DETAIL INFORMATION EDUCATIONAL
PYTHON IN DETAIL INFORMATION EDUCATIONALPYTHON IN DETAIL INFORMATION EDUCATIONAL
PYTHON IN DETAIL INFORMATION EDUCATIONAL
 
Introduction to Python.pptx
Introduction to Python.pptxIntroduction to Python.pptx
Introduction to Python.pptx
 
plone.app.multilingual
plone.app.multilingual plone.app.multilingual
plone.app.multilingual
 
What is python
What is pythonWhat is python
What is python
 

Recently uploaded

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsVlad Stirbu
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform EngineeringJemma Hussein Allen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 

Recently uploaded (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Intro to KotlinNLP

  • 1. Intro to KotlinNLP: An Open-Source Library for Natural Language Processing https://github.com/KotlinNLP “Reinvent the wheel if it helps you sleep at night.” - KotlinNLP Authors
  • 2. OUTLINE ▪ Why Kotlin? ▪ What do we mean by NLP? ▪ KotlinNLP Library Overview (with a look at the code on GitHub!) ▪ Repositories / General Architecture ▪ Some more details on SimpleDNN (Machine Learning) and NeuralParser (NLP) ▪ Questions?
  • 3. 1. Over the past 24 months, the adoption and the use of the Kotlin programming language among developers has seen tremendous growth. https://pusher.com/state-of-kotlin WHY ? 2. We think that there is a space for a library written entirely in Kotlin dedicated to Machine Learning and Natural Language Processing in particular. Other experiments are not widely established yet (e.g. Komputation). 3. We like Kotlin, a lot!
  • 4. Language Identification / Text Tokenization Structural Analysis Morphological Analysis Part of Speech Tagging Dependency Parsing Named Entity Recognition / Date/Time- Recognition Semantic Parsing Anaphora Resolution Semantic Role Labeling Intent Detection Entity Linking (Geoparsing) Text Categorization / Profiling Topic Detection / Sentiment Analysis WHAT DO WE MEAN BY NLP?
  • 5. We believe you don't need to be a Deep Learning expert to work with state-of-the-art Natural Language Processing techniques. https://github.com/KotlinNLP MOZILLA PUBLIC LICENSE VERSION 2.0
  • 6. SimpleDNN SimpleDNN is a machine learning lightweight open-source library designed to support relevant neural network architectures in natural language processing tasks. LinguisticDescription LinguisticDescription is a Kotlin library designed to support linguistic annotations over morphological, syntactic and semantic levels of complex languages. LanguageDetector LanguageDetector is a very simple to use text language detector which uses the Hierarchical Attention Networks (HAN) from the SimpleDNN library. NeuralTokenizer NeuralTokenizer is a very simple to use text tokenizer and sentence splitter which uses neural networks from the SimpleDNN library. NeuralParser NeuralParser is a very simple to use dependency parser, based on the SimpleDNN library and the SyntaxDecoder transition systems framework. TokensEncoder TokensEncoder is a neural processor that transform a sentence in a dense encoded representation. MorphologicalAnalyzer MorphologicalAnalyzer is a Kotlin library designed to support morphological analysis of a text, including enclitics, multi-words, numbers and time-expressions. GeoLocation GeoLocation is a Kotlin library designed to support the identification of geo-locations in a text. Others CONNLIO, DependencyTree, Utils, … https://github.com/KotlinNLP MOZILLA PUBLIC LICENSE VERSION 2.0
  • 7. SimpleDNN SimpleDNN is a machine learning lightweight open-source library designed to support relevant neural network architectures in natural language processing tasks. LinguisticDescription LinguisticDescription is a Kotlin library designed to support linguistic annotations over morphological, syntactic and semantic levels of complex languages. LanguageDetector LanguageDetector is a very simple to use text language detector which uses the Hierarchical Attention Networks (HAN) from the SimpleDNN library. NeuralTokenizer NeuralTokenizer is a very simple to use text tokenizer and sentence splitter which uses neural networks from the SimpleDNN library. NeuralParser NeuralParser is a very simple to use dependency parser, based on the SimpleDNN library and the SyntaxDecoder transition systems framework. TokensEncoder TokensEncoder is a neural processor that transform a sentence in a dense encoded representation. MorphologicalAnalyzer MorphologicalAnalyzer is a Kotlin library designed to support morphological analysis of a text, including enclitics, multi-words, numbers and time-expressions. GeoLocation GeoLocation is a Kotlin library designed to support the identification of geo-locations in a text. Others CONNLIO, DependencyTree, Utils, … https://github.com/KotlinNLP MOZILLA PUBLIC LICENSE VERSION 2.0 Mathematical operations within the SimpleDNN library are performed by the CPU with jblas. GPU support still missing, we need your help! > 1.000 Unit Test with Spek
  • 8. SimpleDNN SimpleDNN is a machine learning lightweight open-source library designed to support relevant neural network architectures in natural language processing tasks. LinguisticDescription LinguisticDescription is a Kotlin library designed to support linguistic annotations over morphological, syntactic and semantic levels of complex languages. LanguageDetector LanguageDetector is a very simple to use text language detector which uses the Hierarchical Attention Networks (HAN) from the SimpleDNN library. NeuralTokenizer NeuralTokenizer is a very simple to use text tokenizer and sentence splitter which uses neural networks from the SimpleDNN library. NeuralParser NeuralParser is a very simple to use dependency parser, based on the SimpleDNN library and the SyntaxDecoder transition systems framework. TokensEncoder TokensEncoder is a neural processor that transform a sentence in a dense encoded representation. MorphologicalAnalyzer MorphologicalAnalyzer is a Kotlin library designed to support morphological analysis of a text, including enclitics, multi-words, numbers and time-expressions. GeoLocation GeoLocation is a Kotlin library designed to support the identification of geo-locations in a text. Others CONNLIO, DependencyTree, Utils, … https://github.com/KotlinNLP MOZILLA PUBLIC LICENSE VERSION 2.0 LinguisticDescription is shared by all the linguistic processors: it contains all the structures for describing linguistic phenomena, from tokens to morpho-syntactic structures. Compatible with morphological dictionaries for the Italian language available from the LINDAT repository https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2630
  • 9. SimpleDNN Three main building blocks Optimizer NeuralProcessor NeuralModel
  • 10. SimpleDNN Neural Model Basically, a neural model is a serializable class containing a list of trainable patermeters (updatable arrays).
  • 11. SimpleDNN Neural Processor ▪ fun forward(input: InputType): OutputType ▪ fun backward(outputErrors: ErrorsType) ▪ fun getInputErrors(): InputErrorsType ▪ fun getParamsErrors(): ParamsErrorsType
  • 12. SimpleDNN Optimizer ▪ fun accumulate(paramsErrors: ParamsErrorsType) ▪ fun update()
  • 13. NeuralProcessor Feedforward Recurrent BidirectionalRecurrent Attentive Pointer TreeEncoder Layer Feed- Forward Simple Highway Merge Concat Affine Sum Average Recurrent Simple LSTM GRU RAN CFN DeltaRNN IndRNN SimpleDNN / Core Structures Array Augmented Activable Updatable Dense Sparse Activation ELU ReLU Sigmoid Softmax SoftSign Tanh UpdateMethod AdaGrad ADAM LearningRate Momentum (+Nesterov) RMSProp
  • 15. TokensEncoder : NeuralProcessor Word/POS Embeddings Character Embeddings BiLSTM Attention Morpho Ensemble TokensEncoder S T U D Y Image from https://web.stanford.e du/~tdozat/files/TDoz at-CoNLL2017- Paper.pdf
  • 17. TransitionSystem ArcStandard (Nivre, 2004) ArcEager (Nivre, 2003) ArcHybrid (Kuhlmann, 2011) ArcSwift (Qi, 2017) ArcSpine (Sartorio, 2013) ArcEagerSpine (Grella, 2014) EasyFirst (Goldberg, 2010) Covington (2001) ArcDistance (Attardi, 2006) Image from https://nlp.stanford.edu/pubs/qi2017arcswift.pdf ArcsStack Buffer Transition-based Parsing
  • 18. SYNTAX DECODER Generalized Transition-based Parsing Framework score(transition, state) = weight · features(transition, state)
  • 20. Transition-based Parsing ☺ • Built-in support for greedy, non- monotonic and multi-threaded beam decoding. • Training can be made using static, dynamic or non-derministic oracles.
  • 21. LHRParser Latent Heads Representation (Grella and Cangialosi, 2018) A state-of-the-art neural dependency parser that implements a novel approach based on a bidirectional recurrent autoencoder to perform globally optimized non-projective parsing via semi-supervised learning. https://arxiv.org/abs/1802.02116
  • 22. Latent Heads Representation Parser No structured prediction, full neural parsing!
  • 23. THANK YOU! QUESTIONS? Do you like it? Add a on GitHub! https://github.com/KotlinNLP