SlideShare a Scribd company logo
1 of 58
Introducción de NLP en
Azure
ORGANIZATION
Thank you!
@zNk
jasanchez@plainconcepts.com
• Interested in:
• Machine Learning (NLP, productionalize models)
• Highly scalable architectures (based on hadoop, spark, etc)
• Software Engineering (i really like C#)
• Gaming: Pubg, LoL
Jesús A. Sánchez Méndez
Data Engineer
@ematde
ematallanas@plainconcepts.com
• Knowmad, working with data long time ago
• Interested in:
• AI applied to industry
• Research in robotics, new models and general use of ML
• Love films and series
Eduardo Matallanas
Data Engineer
Things that you will learn today (hopefully)
1. What is NLP
2. How to face a NLP project
3. Different approaches to model NLP
4. Different techniques involved in the creation a NLP model
5. Libs and frameworks that we use and work/worked
What is NLP?
Language is a process of free creation; its laws and
principles are fixed; but the manner in which the
principles of generation are used is free and
infinitely varied. Even the interpretation and used
of words involves a process of free creation.
Noam Chomsky
Expectations vs Reality
Excitations Input channel
Processing
Output
Context
Recognition
Word
analysis
Sentiment
Analysis
Answer
Entity
recognition
Synonyms
Disambiguation
Lexical
semantics
How can a
virtual
assistant do it?
{ }
Language
Undertanding (LUIS) Amazon Lex
Privada
101: How to build
your own NLP
Facing a NLP project? Step to success
Data Collection
& Assembly
Data
Preprocessing
Data Exploration
& Visualization
Model Building
Model
Evaluation
The importance
of data
preprocessing
Step 1: Gather the
data!
Gather data
• Different data sources
• The nature of the data is important
• Which data is really important to build your
model?
Step 2:
Data preprocessing
Why preprocessing?
Tokenization
Easy, huh?
What about this? Jesus’ car wasn’t fined since he’s
resident that well-known neighbor
That’s not as easy as: Jesus’ car was not fined since
he is resident in that neighbor
Stemming
EATING -> EAT
RUNNING -> RUN
AMUSING -> AMUS
GRATEFULLY -> GRATE
STUDIES -> STUDI
LEMMATIZING
AM, ARE, IS -> BE
CARS, CAR, CAR’S -> CAR
DESTRUCTION -> DESTROY
Quite often used techniques
• Sentence to lower case
• Remove numbers (or even translate
them to text)
• Remove punctuation (this is usually
done in tokenization stage)
• Trimming
• Remove stopwords
• Depending on the domain, remove
sparse terms
• It was a pleasure David  it was a pleasure david
• My 15 birthday  My fifteenth birthday
• The house, in Madrid, was smaller.  The house in
Madrid was smaller
• the car broke in front of the building
Step 3:
Model building
Quick reminder: Machine Learning models?
𝑌 = 𝑓(𝑋)The output
we want to
predict
Our input
data (X)
The function
itself is the
model that we
are going to
use to predict
the output (Y)
Does this make sense?
𝑓 𝑋 = 𝑋2 𝑓 "ℎ𝑒𝑦 𝑤ℎ𝑎𝑡′𝑠 𝑢𝑝? " = ?
Of course it doesn’t…
From Words To vectors
How can we build the model?
1. What is needed?  Understand words (input)
2. What is expected?  labels, numbers, words, etc.
(output)
Need a vocabulary!!!
Model
Input Output
𝑤𝑡ℎ𝑒,2 = 2 𝑥 log(
4
2
) = 0,6
𝑤𝑓𝑜𝑥,1 = 1 𝑥 log(
1
1
) = 0
Second approximation
Lots of words
High representation of general information
Encode the data based on the order of appearance  e.g. Hashing function
Too large dimensionality  PCA for reduction
Clustering the data  K-means
WIKIPEDIA1
2
3
4
Word
embedding
Words or phrases are mapped to vectors of real
numbers
Relationship of words
•CBOW  Predict a word given the context
•Skipgram  Predict the context
Dense and low-dimensional vectors
Their representation is learned by the usage of words
Sparse
Dense
Word2Vec
Two layer Neural Network NOT A DEEP NN
Applied to any written text
(words, sentence, paragraph, etc)
Word2Vec, CBOW? Skip-Gram?
Predicts a word Predicts context
Word2Vec, CBOW? Skip-Gram?
Predicts a word Predicts context
Word2Vec, CBOW? Skip-Gram?
Input  a corpus
Approach  Skip-gram
Outpout  a dictionary [word,vector]
Word Distance
Portugal 0,83447
France 0,72341
Moroco 0,69847
Andorra 0,653456
e.g. Spain
• Created by Facebook AI Research lab
• Pretrained models for 294 languages
• Extract a relationship even if the word is not in the
corpus
• Enrich an existing model by assigning more weight
to your own trained vectors
• Fastest lib
• Allows training with CBOW or skipgram
• Provides classification and clustering capabilities
Nice, this is a solution for
everything!
• Indeed, whatever floats your boat
• Quite often embedding is used to feed another model
• Classifier o regression model
• Deep neural network
Model
Input OutputWord
Embedding ?
Sounds good but it has to have a disadvantage
[0,0,0,0,0,0,0,0,0] (needs retrain)
What happens when you try get a vector of a word that does not
appear in the corpus?
Coupled to the language
Applications
• Multilanguage support • Relationship from language to language
Logic arithmetic (analogies)
king:queen [woman,
Attempted abduction,
teenager, girl]
man
China:Taiwan [Ukraine, Moscow,
Moldova, Armenia]
Rusia
Republican:Donald Trump
[Democratic, GOP,
Democrats, McCain]
Barack Obama
Building:Architech Software
[programmer,
SecurityCenter,
WinPcap]
Logic arithmetic (analogies)
Geopolitics: Iraq - Violence = Jordan
Distinction: Human - Animal = Ethics
President - Power = Prime Minister
Library - Books = Hall
Analogy: Stock Market ≈ Thermometer
Other examples
house:roof::castle:[dome, bell_tower, spire, crenellations,
turrets]
knee:leg::elbow:[forearm, arm, ulna_bone]
love:indifference::fear:[apathy, callousness, timidity,
helplessness, inaction]
Other
applications
• Information retrieval
• Text classification
• Sentiment analysis
• Entity recognition
• Extract keywords
Conclusion
Data and analysis are key
NLP models are complex structures
A general purpose model does not exist
Use framework and library to begin something small and
grow it
Hopefully you are now an expert in NLP algorithms
Now is your turn
Questions & Answers
Thanks and …
See you soon!
Thanks also to the organization
Without whom this would not have been posible.
Biblio
• Preprocessing
• http://www.nltk.org/
• https://pandas.pydata.org/
• Word Embedding
• https://skymind.ai/wiki/word2vec
• https://www.tensorflow.org/tutorials/representation/word2vec
• https://fasttext.cc/
• https://nlp.stanford.edu/projects/glove/
• LSTM
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Pandas, numpyTrabajo en python
Fasttext

More Related Content

Similar to Introducción a NLP en Azure: Las 5 etapas clave de un proyecto de NLP

AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS Academy
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflowCharmi Chokshi
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Introducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applicationsIntroducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applicationsRokesh Jankie
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Chatbots and Deep Learning
Chatbots and Deep LearningChatbots and Deep Learning
Chatbots and Deep LearningAndherson Maeda
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0Plain Concepts
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Program Synthesis, DreamCoder, and ARC
Program Synthesis, DreamCoder, and ARCProgram Synthesis, DreamCoder, and ARC
Program Synthesis, DreamCoder, and ARCAndrey Zakharevich
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Boost Your Base Bootcamp - [Online & Offline] In Bangla
Boost Your Base Bootcamp - [Online & Offline] In BanglaBoost Your Base Bootcamp - [Online & Offline] In Bangla
Boost Your Base Bootcamp - [Online & Offline] In BanglaStack Learner
 
scratch course-part2-2023.pdf
scratch course-part2-2023.pdfscratch course-part2-2023.pdf
scratch course-part2-2023.pdfDoaa Mohey Eldin
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine LearningAngelo Simone Scotto
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 

Similar to Introducción a NLP en Azure: Las 5 etapas clave de un proyecto de NLP (20)

AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
AILABS - Lecture Series - Is AI the New Electricity? - Advances In Machine Le...
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Introducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applicationsIntroducing TensorFlow: The game changer in building "intelligent" applications
Introducing TensorFlow: The game changer in building "intelligent" applications
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
 
Chatbots and Deep Learning
Chatbots and Deep LearningChatbots and Deep Learning
Chatbots and Deep Learning
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
DotNet 2019 | Pablo Doval - Recurrent Neural Networks with TF2.0
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Program Synthesis, DreamCoder, and ARC
Program Synthesis, DreamCoder, and ARCProgram Synthesis, DreamCoder, and ARC
Program Synthesis, DreamCoder, and ARC
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Boost Your Base Bootcamp - [Online & Offline] In Bangla
Boost Your Base Bootcamp - [Online & Offline] In BanglaBoost Your Base Bootcamp - [Online & Offline] In Bangla
Boost Your Base Bootcamp - [Online & Offline] In Bangla
 
scratch course-part2-2023.pdf
scratch course-part2-2023.pdfscratch course-part2-2023.pdf
scratch course-part2-2023.pdf
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 

More from Plain Concepts

R y Python con Power BI, la ciencia y el análisis de datos, juntos
R y Python con Power BI, la ciencia y el análisis de datos, juntosR y Python con Power BI, la ciencia y el análisis de datos, juntos
R y Python con Power BI, la ciencia y el análisis de datos, juntosPlain Concepts
 
Video kills the radio star: e-mail is crap and needed disruption
 Video kills the radio star: e-mail is crap and needed disruption Video kills the radio star: e-mail is crap and needed disruption
Video kills the radio star: e-mail is crap and needed disruptionPlain Concepts
 
Cómo redefinir tu organización con IA
Cómo redefinir tu organización con IACómo redefinir tu organización con IA
Cómo redefinir tu organización con IAPlain Concepts
 
Dx29: assisting genetic disease diagnosis with physician-focused AI pipelines
Dx29: assisting genetic disease diagnosis with physician-focused AI pipelinesDx29: assisting genetic disease diagnosis with physician-focused AI pipelines
Dx29: assisting genetic disease diagnosis with physician-focused AI pipelinesPlain Concepts
 
¿Qué es real? Cuando la IA intenta engañar al ojo humano
¿Qué es real? Cuando la IA intenta engañar al ojo humano¿Qué es real? Cuando la IA intenta engañar al ojo humano
¿Qué es real? Cuando la IA intenta engañar al ojo humanoPlain Concepts
 
Inteligencia artificial para detectar el cáncer de mama
Inteligencia artificial para  detectar el cáncer de mamaInteligencia artificial para  detectar el cáncer de mama
Inteligencia artificial para detectar el cáncer de mamaPlain Concepts
 
¿Está tu compañía preparada para el reto de la Inteligencia Artificial?
¿Está tu compañía preparada para el reto de la Inteligencia Artificial?¿Está tu compañía preparada para el reto de la Inteligencia Artificial?
¿Está tu compañía preparada para el reto de la Inteligencia Artificial?Plain Concepts
 
Cognitive Services en acción
Cognitive Services en acciónCognitive Services en acción
Cognitive Services en acciónPlain Concepts
 
El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...
El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...
El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...Plain Concepts
 
What if AI was your daughter?
What if AI was your daughter?What if AI was your daughter?
What if AI was your daughter?Plain Concepts
 
Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...
Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...
Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...Plain Concepts
 
Revolucionando la experiencia de cliente con Big Data e IA
Revolucionando la experiencia de cliente con Big Data e IARevolucionando la experiencia de cliente con Big Data e IA
Revolucionando la experiencia de cliente con Big Data e IAPlain Concepts
 
Recuperación de información para solicitantes de empleo
Recuperación de información para solicitantes de empleoRecuperación de información para solicitantes de empleo
Recuperación de información para solicitantes de empleoPlain Concepts
 
La nueva revolución Industrial: Inteligencia Artificial & IoT Edge
La nueva revolución Industrial: Inteligencia Artificial & IoT EdgeLa nueva revolución Industrial: Inteligencia Artificial & IoT Edge
La nueva revolución Industrial: Inteligencia Artificial & IoT EdgePlain Concepts
 
DotNet 2019 | Sherry List - Azure Cognitive Services with Native Script
DotNet 2019 | Sherry List - Azure Cognitive Services with Native ScriptDotNet 2019 | Sherry List - Azure Cognitive Services with Native Script
DotNet 2019 | Sherry List - Azure Cognitive Services with Native ScriptPlain Concepts
 
DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...
DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...
DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...Plain Concepts
 
DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...
DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...
DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...Plain Concepts
 
El camino a las Cloud Native Apps - Introduction
El camino a las Cloud Native Apps - IntroductionEl camino a las Cloud Native Apps - Introduction
El camino a las Cloud Native Apps - IntroductionPlain Concepts
 
El camino a las Cloud Native Apps - Azure AI
El camino a las Cloud Native Apps - Azure AIEl camino a las Cloud Native Apps - Azure AI
El camino a las Cloud Native Apps - Azure AIPlain Concepts
 

More from Plain Concepts (20)

R y Python con Power BI, la ciencia y el análisis de datos, juntos
R y Python con Power BI, la ciencia y el análisis de datos, juntosR y Python con Power BI, la ciencia y el análisis de datos, juntos
R y Python con Power BI, la ciencia y el análisis de datos, juntos
 
Video kills the radio star: e-mail is crap and needed disruption
 Video kills the radio star: e-mail is crap and needed disruption Video kills the radio star: e-mail is crap and needed disruption
Video kills the radio star: e-mail is crap and needed disruption
 
Cómo redefinir tu organización con IA
Cómo redefinir tu organización con IACómo redefinir tu organización con IA
Cómo redefinir tu organización con IA
 
Dx29: assisting genetic disease diagnosis with physician-focused AI pipelines
Dx29: assisting genetic disease diagnosis with physician-focused AI pipelinesDx29: assisting genetic disease diagnosis with physician-focused AI pipelines
Dx29: assisting genetic disease diagnosis with physician-focused AI pipelines
 
¿Qué es real? Cuando la IA intenta engañar al ojo humano
¿Qué es real? Cuando la IA intenta engañar al ojo humano¿Qué es real? Cuando la IA intenta engañar al ojo humano
¿Qué es real? Cuando la IA intenta engañar al ojo humano
 
Inteligencia artificial para detectar el cáncer de mama
Inteligencia artificial para  detectar el cáncer de mamaInteligencia artificial para  detectar el cáncer de mama
Inteligencia artificial para detectar el cáncer de mama
 
¿Está tu compañía preparada para el reto de la Inteligencia Artificial?
¿Está tu compañía preparada para el reto de la Inteligencia Artificial?¿Está tu compañía preparada para el reto de la Inteligencia Artificial?
¿Está tu compañía preparada para el reto de la Inteligencia Artificial?
 
Cognitive Services en acción
Cognitive Services en acciónCognitive Services en acción
Cognitive Services en acción
 
El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...
El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...
El Hogar Inteligente. De los datos de IoT a los hábitos de una familia a trav...
 
What if AI was your daughter?
What if AI was your daughter?What if AI was your daughter?
What if AI was your daughter?
 
Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...
Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...
Recomendación Basada en Contenidos con Deep Learning: Qué queríamos hacer, Qu...
 
Revolucionando la experiencia de cliente con Big Data e IA
Revolucionando la experiencia de cliente con Big Data e IARevolucionando la experiencia de cliente con Big Data e IA
Revolucionando la experiencia de cliente con Big Data e IA
 
IA Score en InfoJobs
IA Score en InfoJobsIA Score en InfoJobs
IA Score en InfoJobs
 
Recuperación de información para solicitantes de empleo
Recuperación de información para solicitantes de empleoRecuperación de información para solicitantes de empleo
Recuperación de información para solicitantes de empleo
 
La nueva revolución Industrial: Inteligencia Artificial & IoT Edge
La nueva revolución Industrial: Inteligencia Artificial & IoT EdgeLa nueva revolución Industrial: Inteligencia Artificial & IoT Edge
La nueva revolución Industrial: Inteligencia Artificial & IoT Edge
 
DotNet 2019 | Sherry List - Azure Cognitive Services with Native Script
DotNet 2019 | Sherry List - Azure Cognitive Services with Native ScriptDotNet 2019 | Sherry List - Azure Cognitive Services with Native Script
DotNet 2019 | Sherry List - Azure Cognitive Services with Native Script
 
DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...
DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...
DotNet 2019 | Quique Fernández - Potenciando VUE con TypeScript, Inversify, V...
 
DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...
DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...
DotNet 2019 | Daniela Solís y Manuel Rodrigo Cabello - IoT, una Raspberry Pi ...
 
El camino a las Cloud Native Apps - Introduction
El camino a las Cloud Native Apps - IntroductionEl camino a las Cloud Native Apps - Introduction
El camino a las Cloud Native Apps - Introduction
 
El camino a las Cloud Native Apps - Azure AI
El camino a las Cloud Native Apps - Azure AIEl camino a las Cloud Native Apps - Azure AI
El camino a las Cloud Native Apps - Azure AI
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Introducción a NLP en Azure: Las 5 etapas clave de un proyecto de NLP

Editor's Notes

  1. Tokenization Normalization Stemming Lemmatization set all characters to lowercase remove numbers (or convert numbers to textual representations) remove punctuation (generally part of tokenization, but still worth keeping in mind at this stage, even as confirmation) strip white space (also generally part of tokenization) remove default stop words (general English stop words) remove text file headers, footers remove HTML, XML, etc. markup and metadata extract valuable data from other formats, such as JSON, or from within databases if you fear regular expressions, this could potentially be the part of text preprocessing in which your worst fears are realized
  2. Tokenization Normalization Stemming Lemmatization set all characters to lowercase remove numbers (or convert numbers to textual representations) remove punctuation (generally part of tokenization, but still worth keeping in mind at this stage, even as confirmation) strip white space (also generally part of tokenization) remove default stop words (general English stop words) remove text file headers, footers remove HTML, XML, etc. markup and metadata extract valuable data from other formats, such as JSON, or from within databases if you fear regular expressions, this could potentially be the part of text preprocessing in which your worst fears are realized
  3. Tokenization Normalization Stemming Lemmatization set all characters to lowercase remove numbers (or convert numbers to textual representations) remove punctuation (generally part of tokenization, but still worth keeping in mind at this stage, even as confirmation) strip white space (also generally part of tokenization) remove default stop words (general English stop words) remove text file headers, footers remove HTML, XML, etc. markup and metadata extract valuable data from other formats, such as JSON, or from within databases if you fear regular expressions, this could potentially be the part of text preprocessing in which your worst fears are realized
  4. Aquí hago un recordatorio de que un modelo de ML puede ser descrito con algo tan simple como una función matemática
  5. Aquí hago un recordatorio de que un modelo de ML puede ser descrito con algo tan simple como una función matemática
  6. https://skymind.ai/wiki/bagofwords-tf-idf
  7. To understand better a high representation is needed
  8. Skip-gram: works well with small amount of the training data, represents well even rare words or phrases. CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words Comentar que existe para cualquier representación
  9. En el caso de que aparexcan palabras nuevas ¿que hacer?
  10. En el caso de que aparexcan palabras nuevas ¿que hacer?
  11. En el caso de que aparexcan palabras nuevas ¿que hacer?
  12. En el caso de que aparexcan palabras nuevas ¿que hacer?
  13. Vocabulario y grammar