SlideShare a Scribd company logo
Summer 2017
Elvis Saravia
PhD, Information Systems and Applications
ellfae@gmail.com
Github username: omarsar
Questions: sli.do (#Z217)
2
●
●
●
●
●
●
● Knowledge Discovery (KDD) Process
3
4
5
ConceptNet
6
●
●
●
7
Motel = [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
Hotel = [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
●
●
One-hot representation
8
hotel = [0.728 0.234 -0.23 0.223]
Distributed representation (low-dimension vector)
9
10
Paper source: https://arxiv.org/pdf/1301.3781.pdf
11
Paper source: https://arxiv.org/pdf/1301.3781.pdf
Feedforward Neural Net Language Model (NNLM)
variables to optimize
denotes window range
12
13
P(the|over)
P(fox|over)
P(jumped|over)
P(the|over)
P(lazy|over)
P(dog|over)
P(VOUT
| VIN
)
How to define this prob. distribution?
Determines similarity in [-1,1]
Get a probability in [0,1] out of a similarity in [-1,1]
14
15
https://www.healthvault.com/en-us/health-bo
t/
16
● https://goo.gl/ppHX65
●
○ Gensim guide for word2vec: https://goo.gl/i2UrdH
● https://goo.gl/7b72S9
●
● https://goo.gl/uNJDrs
●
17
18
19
20
21
22
23
● https://goo.gl/KYacjz
●
●
●
●
●
● https://goo.gl/JezgYg
●
24
a. Build API: (Flask/Django recommended)
b. Pretrained models: (Guide: https://goo.gl/5qt2Ki)
c. Visualization: d3js / plotly / tensorboard
a. LSTM - (Guide: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
b. CNN - (Guide: https://goo.gl/PgLUs7)
c. RNN - (Guide: https://goo.gl/5L9kci
a. Starting point:https://rare-technologies.com/word2vec-tutorial#app
25

More Related Content

Similar to Text mining lab (summer 2017) - Word Vector Representation

IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
KtonNguyn2
 
Introduction to Graph Databases @ SAI
Introduction to Graph Databases @ SAIIntroduction to Graph Databases @ SAI
Introduction to Graph Databases @ SAI
datablend
 
Enabling semantic integration
Enabling semantic integration Enabling semantic integration
Enabling semantic integration
Jean-Paul Calbimonte
 
Quepy
QuepyQuepy
Quepy
dmoisset
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
PyData
 
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr PryymakProbabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
PyData
 
How to easily find the optimal solution without exhaustive search using Genet...
How to easily find the optimal solution without exhaustive search using Genet...How to easily find the optimal solution without exhaustive search using Genet...
How to easily find the optimal solution without exhaustive search using Genet...
Viach Kakovskyi
 
sos4R @ OGC TC
sos4R @ OGC TCsos4R @ OGC TC
sos4R @ OGC TC
Daniel Nüst
 
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
Yogi Sharo
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
岳華 杜
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
Masumi Shirakawa
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 Bangalore
Satnam Singh
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
University of Washington
 
Let's build a parser!
Let's build a parser!Let's build a parser!
Let's build a parser!
Boy Baukema
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
Jinho Choi
 
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
Cédrick Lunven
 
Concepts of JetBrains MPS
Concepts of JetBrains MPSConcepts of JetBrains MPS
Concepts of JetBrains MPS
Vaclav Pech
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
Julien SIMON
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphs
Arijit Khan
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
Nesreen K. Ahmed
 

Similar to Text mining lab (summer 2017) - Word Vector Representation (20)

IA3_presentation.pptx
IA3_presentation.pptxIA3_presentation.pptx
IA3_presentation.pptx
 
Introduction to Graph Databases @ SAI
Introduction to Graph Databases @ SAIIntroduction to Graph Databases @ SAI
Introduction to Graph Databases @ SAI
 
Enabling semantic integration
Enabling semantic integration Enabling semantic integration
Enabling semantic integration
 
Quepy
QuepyQuepy
Quepy
 
Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014Querying your database in natural language by Daniel Moisset PyData SV 2014
Querying your database in natural language by Daniel Moisset PyData SV 2014
 
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr PryymakProbabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
 
How to easily find the optimal solution without exhaustive search using Genet...
How to easily find the optimal solution without exhaustive search using Genet...How to easily find the optimal solution without exhaustive search using Genet...
How to easily find the optimal solution without exhaustive search using Genet...
 
sos4R @ OGC TC
sos4R @ OGC TCsos4R @ OGC TC
sos4R @ OGC TC
 
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
[Gary entsminger] turbo_pascal_for_windows_bible(book_fi.org)
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 Bangalore
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Let's build a parser!
Let's build a parser!Let's build a parser!
Let's build a parser!
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
 
VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019VoxxedDays Luxembourg 2019
VoxxedDays Luxembourg 2019
 
Concepts of JetBrains MPS
Concepts of JetBrains MPSConcepts of JetBrains MPS
Concepts of JetBrains MPS
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphs
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 

More from Elvis Saravia

The Future of Brain-Powered Learning
The Future of Brain-Powered Learning The Future of Brain-Powered Learning
The Future of Brain-Powered Learning
Elvis Saravia
 
Introduction to Fundamentals of RNNs
Introduction to Fundamentals of RNNsIntroduction to Fundamentals of RNNs
Introduction to Fundamentals of RNNs
Elvis Saravia
 
Thesis oral defense 2015 elvis saravia
Thesis oral defense 2015  elvis saraviaThesis oral defense 2015  elvis saravia
Thesis oral defense 2015 elvis saravia
Elvis Saravia
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
Elvis Saravia
 
The Neurochemistry of Music
The Neurochemistry of MusicThe Neurochemistry of Music
The Neurochemistry of Music
Elvis Saravia
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?
Elvis Saravia
 
Crowdsource Delivery System - Improving traditional delivery systems
Crowdsource Delivery System - Improving traditional delivery systemsCrowdsource Delivery System - Improving traditional delivery systems
Crowdsource Delivery System - Improving traditional delivery systems
Elvis Saravia
 
Relational Databases - Benefits and Challenges
Relational Databases - Benefits and ChallengesRelational Databases - Benefits and Challenges
Relational Databases - Benefits and Challenges
Elvis Saravia
 
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Elvis Saravia
 

More from Elvis Saravia (9)

The Future of Brain-Powered Learning
The Future of Brain-Powered Learning The Future of Brain-Powered Learning
The Future of Brain-Powered Learning
 
Introduction to Fundamentals of RNNs
Introduction to Fundamentals of RNNsIntroduction to Fundamentals of RNNs
Introduction to Fundamentals of RNNs
 
Thesis oral defense 2015 elvis saravia
Thesis oral defense 2015  elvis saraviaThesis oral defense 2015  elvis saravia
Thesis oral defense 2015 elvis saravia
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
 
The Neurochemistry of Music
The Neurochemistry of MusicThe Neurochemistry of Music
The Neurochemistry of Music
 
NewSQL - The Future of Databases?
NewSQL - The Future of Databases?NewSQL - The Future of Databases?
NewSQL - The Future of Databases?
 
Crowdsource Delivery System - Improving traditional delivery systems
Crowdsource Delivery System - Improving traditional delivery systemsCrowdsource Delivery System - Improving traditional delivery systems
Crowdsource Delivery System - Improving traditional delivery systems
 
Relational Databases - Benefits and Challenges
Relational Databases - Benefits and ChallengesRelational Databases - Benefits and Challenges
Relational Databases - Benefits and Challenges
 
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
 

Recently uploaded

Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Text mining lab (summer 2017) - Word Vector Representation