SlideShare a Scribd company logo

Natural Language Processing in R (rNLP)

The introductory slides of a workshop given to the doctoral school at the Institute of Business Informatics of the Goethe University Frankfurt. The tutorials are available on http://crunch.kmi.open.ac.uk/w/index.php/Tutorials

1 of 64
Download to read offline
Natural Language Processing
in R (rNLP)
Fridolin Wild, The Open University, UK
Tutorial to the Doctoral School
at the Institute of Business Informatics
of the Goethe University Frankfurt
Structure of this tutorial
• An introduction to R and cRunch
• Language basics in R
• Basic I/O in R
• Social Network Analysis
• Latent Semantic Analysis
• Twitter
• Sentiment
• (Advanced I/O in R: MySQL, SparQL)
Introduction
cRunch
• is an infrastructure
• for computationally-intense learning
analytics
• supporting researchers
• in investigating big data
• generated in the co-construction of
knowledge
… and beyond
…
Architecture
(Thiele & Lehner, 2011)
Architecture
(Thiele & Lehner, 2011)
Living Reports
data shop
cron jobs
R webservices

Recommended

Text analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco ControlText analytics in Python and R with examples from Tobacco Control
Text analytics in Python and R with examples from Tobacco ControlBen Healey
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Johan Blomme
 

More Related Content

What's hot

Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Text Mining Using R
Text Mining Using RText Mining Using R
Text Mining Using RKnoldus Inc.
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rVivian S. Zhang
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classificationshakimov
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupDan Sullivan, Ph.D.
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentKemal Can Kara
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
 
Slides
SlidesSlides
Slidesbutest
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisMehwish Alam
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureDr. Christian Betz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)Eran Yahav
 

What's hot (20)

Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Text Mining Using R
Text Mining Using RText Mining Using R
Text Mining Using R
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classification
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
A Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitmentA Matching Approach Based on Term Clusters for eRecruitment
A Matching Approach Based on Term Clusters for eRecruitment
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
 
Slides
SlidesSlides
Slides
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept AnalysisNavigating and Exploring RDF Data using Formal Concept Analysis
Navigating and Exploring RDF Data using Formal Concept Analysis
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)Programming with Millions of Examples (HRL)
Programming with Millions of Examples (HRL)
 

Viewers also liked

Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisAravind Babu
 
Social media analysis in R using twitter API
Social media analysis in R using twitter API Social media analysis in R using twitter API
Social media analysis in R using twitter API Mohd Shadab Alam
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Marina Santini
 
NLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみたNLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみたYoshiyuki Kakihara
 
NLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey StellaNLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey StellaSpark Summit
 
Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6William Colen
 
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning KeynoteStartupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning KeynoteStartupfest
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksGuillaume Pitel
 
Natural language procesing in R
Natural language procesing in RNatural language procesing in R
Natural language procesing in ROlabanji Shonibare
 
Practical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsPractical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsZhipeng Liang
 
Webinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceWebinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceQuanticMind
 
An ad words ad performance analysis by r
An ad words ad performance analysis by rAn ad words ad performance analysis by r
An ad words ad performance analysis by rSimonChen888
 
Building Emoji Autocomplete
Building Emoji AutocompleteBuilding Emoji Autocomplete
Building Emoji AutocompleteDasmer Singh
 

Viewers also liked (20)

TextMining with R
TextMining with RTextMining with R
TextMining with R
 
Integrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment AnalysisIntegrating R & Hadoop - Text Mining & Sentiment Analysis
Integrating R & Hadoop - Text Mining & Sentiment Analysis
 
Social media analysis in R using twitter API
Social media analysis in R using twitter API Social media analysis in R using twitter API
Social media analysis in R using twitter API
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?
 
Tulane March 2017 Talk
Tulane March 2017 TalkTulane March 2017 Talk
Tulane March 2017 Talk
 
NLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみたNLP若手の会シンポジウム行ってきた & Chainer使ってみた
NLP若手の会シンポジウム行ってきた & Chainer使ってみた
 
NLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey StellaNLP Structured Data Investigation on Non-Text by Casey Stella
NLP Structured Data Investigation on Non-Text by Casey Stella
 
Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6Processamento Automático da Língua Portuguesa - Campus Party Br 6
Processamento Automático da Língua Portuguesa - Campus Party Br 6
 
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning KeynoteStartupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
Startupfest 2015: HARPER REED (Modest, Inc.) - Lightning Keynote
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasks
 
Natural language procesing in R
Natural language procesing in RNatural language procesing in R
Natural language procesing in R
 
Practical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and MethodsPractical Predictive Analytics Models and Methods
Practical Predictive Analytics Models and Methods
 
Webinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data ScienceWebinar: Maximize Keyword Profits & Conversions with Data Science
Webinar: Maximize Keyword Profits & Conversions with Data Science
 
NLP from scratch
NLP from scratch NLP from scratch
NLP from scratch
 
An ad words ad performance analysis by r
An ad words ad performance analysis by rAn ad words ad performance analysis by r
An ad words ad performance analysis by r
 
Building Emoji Autocomplete
Building Emoji AutocompleteBuilding Emoji Autocomplete
Building Emoji Autocomplete
 

Similar to Natural Language Processing in R (rNLP)

Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD Aldo Gangemi
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesBig Data Colombia
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingStefan Marr
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
KDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfKDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfssuserf2f0fe
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2yannabraham
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 

Similar to Natural Language Processing in R (rNLP) (20)

Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
User biglm
User biglmUser biglm
User biglm
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent Programming
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
R tutorial
R tutorialR tutorial
R tutorial
 
KDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdfKDD17Tutorial_final (1).pdf
KDD17Tutorial_final (1).pdf
 
Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2Elegant Graphics for Data Analysis with ggplot2
Elegant Graphics for Data Analysis with ggplot2
 
What's in a textbook
What's in a textbookWhat's in a textbook
What's in a textbook
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Db1 04
Db1 04Db1 04
Db1 04
 

More from fridolin.wild

Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)fridolin.wild
 
Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0 Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0 fridolin.wild
 
Performance Augmentation
Performance AugmentationPerformance Augmentation
Performance Augmentationfridolin.wild
 
Reality As A Knowledge Medium
Reality As A Knowledge MediumReality As A Knowledge Medium
Reality As A Knowledge Mediumfridolin.wild
 
ARLEM draft spec - overview
ARLEM draft spec - overviewARLEM draft spec - overview
ARLEM draft spec - overviewfridolin.wild
 
AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015fridolin.wild
 
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015fridolin.wild
 
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015fridolin.wild
 
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)fridolin.wild
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)fridolin.wild
 
Learning from meaningful, purposive interaction
Learning from meaningful, purposive interactionLearning from meaningful, purposive interaction
Learning from meaningful, purposive interactionfridolin.wild
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)fridolin.wild
 
ARgh! kinesthetic learning
ARgh! kinesthetic learningARgh! kinesthetic learning
ARgh! kinesthetic learningfridolin.wild
 
Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2fridolin.wild
 
Quantifying reflection
Quantifying reflectionQuantifying reflection
Quantifying reflectionfridolin.wild
 
What if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the FutureWhat if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the Futurefridolin.wild
 
The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.fridolin.wild
 

More from fridolin.wild (20)

Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)Performance Augmentation (Keynote, SIG LT, XR4ALL)
Performance Augmentation (Keynote, SIG LT, XR4ALL)
 
Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0 Professional TEL 4.0: Performance Augmentation for Industry 4.0
Professional TEL 4.0: Performance Augmentation for Industry 4.0
 
Performance Augmentation
Performance AugmentationPerformance Augmentation
Performance Augmentation
 
Reality As A Knowledge Medium
Reality As A Knowledge MediumReality As A Knowledge Medium
Reality As A Knowledge Medium
 
ARLEM draft spec - overview
ARLEM draft spec - overviewARLEM draft spec - overview
ARLEM draft spec - overview
 
AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015AR community meeting, Seoul, Korea, October 6, 2015
AR community meeting, Seoul, Korea, October 6, 2015
 
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
IEEE p1589 'ARLEM' virtual meeting, September 9, 2015
 
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
IEEE p1589 'ARLEM' virtual meeting, July 8, 2015
 
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
Special Interest Group on Wearables-Enhanced Leanring (SIG WELL)
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)
 
Learning from meaningful, purposive interaction
Learning from meaningful, purposive interactionLearning from meaningful, purposive interaction
Learning from meaningful, purposive interaction
 
Reality as a Medium
Reality as a MediumReality as a Medium
Reality as a Medium
 
IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)IEEE augmented reality learning experience model (ARLEM)
IEEE augmented reality learning experience model (ARLEM)
 
ARgh! kinesthetic learning
ARgh! kinesthetic learningARgh! kinesthetic learning
ARgh! kinesthetic learning
 
learning by doing.
learning by doing.learning by doing.
learning by doing.
 
Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2Lab rats-and-the-moral-maze-v2
Lab rats-and-the-moral-maze-v2
 
Quantifying reflection
Quantifying reflectionQuantifying reflection
Quantifying reflection
 
What if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the FutureWhat if...? Technology and Knowledge in the University of the Future
What if...? Technology and Knowledge in the University of the Future
 
Widget- based PLEs
Widget-based PLEsWidget-based PLEs
Widget- based PLEs
 
The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.The Grand Research Challenges for TEL. A shortlist.
The Grand Research Challenges for TEL. A shortlist.
 

Recently uploaded

Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfkatalinjordans1
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education pptsafnarafeek2002
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build PolandGDSC PJATK
 
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docxLeveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docxVotarikari Shravan
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...shaiyuvasv
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxInfosec
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura RochniakFwdays
 
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsInflectra
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...MarcovanHurne2
 

Recently uploaded (20)

Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdf
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education ppt
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build Poland
 
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docxLeveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptx
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak"Testing of Helm Charts or There and Back Again", Yura Rochniak
"Testing of Helm Charts or There and Back Again", Yura Rochniak
 
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+PluginsFrom Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
From Challenger to Champion: How SpiraPlan Outperforms JIRA+Plugins
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
Digital Transformation Strategy & Plan Templates - www.beyondthecloud.digital...
 

Natural Language Processing in R (rNLP)

  • 1. Natural Language Processing in R (rNLP) Fridolin Wild, The Open University, UK Tutorial to the Doctoral School at the Institute of Business Informatics of the Goethe University Frankfurt
  • 2. Structure of this tutorial • An introduction to R and cRunch • Language basics in R • Basic I/O in R • Social Network Analysis • Latent Semantic Analysis • Twitter • Sentiment • (Advanced I/O in R: MySQL, SparQL)
  • 4. cRunch • is an infrastructure • for computationally-intense learning analytics • supporting researchers • in investigating big data • generated in the co-construction of knowledge … and beyond …
  • 6. Architecture (Thiele & Lehner, 2011) Living Reports data shop cron jobs R webservices
  • 8. Living reports • reports with embedded scripts and data • knitr and Sweave • render to html, PDF, … • visualisations: – ggplot2, trellis, graphix – jpg, png, eps, pdf png(file=”n.png”, plot(network(m))) • Fill-in-the-blanks: Drop out quote went down to <<echo=FALSE>>= doquote[“OU”,”2011”] @ documentclass[a4paper]{article} title{Sweave Example 1} author{Friedrich Leisch} begin{document} maketitle In this example we embed parts of the examples from the texttt{kruskal.test} help page into a LaTeX{} document: <<>>= data(airquality) library(ctest) kruskal.test(Ozone ~ Month, data = airquality) @ which shows that the location parameter of the Ozone distribution varies significantly from month to month. Finally we include a boxplot of the data: begin{center} <<fig=TRUE,echo=FALSE>>= boxplot(Ozone ~ Month, data = airquality) @ end{center} end{document}
  • 10. Example html5 report Example Report ============= This is an example of embedded scripts and data. ```{r} a = "hello world” print(a) ``` And here is an example of how to embed a chart. ```{r fig.width=7, fig.height=6} plot( 5:20 ) ```
  • 11. Shiny Widgets (1) • Widgets: use-case sized encapsulations of mini apps • HTML5 • Two files: ui.R, server.R • Still missing: manifest files (info.plist, config.xml)
  • 12. Shiny Widgets (2) From http://www.rstudio.com/shiny/
  • 14. Example R web service print “hello world”
  • 15. More complex R web service setContentType("image/png") a = c(1,3,5,12,13,15) image_file = tempfile() png(file=image_file) plot(a, main = "The magic image", ylab = "", xlab = "", col = c("darkred", "darkblue", "darkgreen") ) dev.off() sendBin(readBin(image_file,'raw',n=file.info(image_file)$size)) unlink(image_file)
  • 16. R web services • Uses the apache mod_R.so • See http://Rapache.net • Common server functions: – GET and POST variables – setContentType – sendBin – …
  • 17. A word on memory mgmt. • Advanced memory management (see p.70 of Dietl diploma thesis): – Use package big memory (for shared memory across threads) – Use package Rserve (for shared read-only access across threads) – Swap out memory objects with save() and load() – The latter is typically sufficient (hard disks are fast!) • data management abstraction layer for mod_R.so: configure handler in http.conf: specify directory match and load specific data management routines at start up: REvalOnStartup "source(‟/dbal.R');"
  • 19. Job scheduling • crontab entries for R webservices • e.g. harvest feeds • e.g. store in local DB
  • 21. Data shop and the community • You have a „public/‟ folder :) – „public/data‟: save() any .rda file and it will be indexed within the hour – „public/services‟: use this to execute your scripts; indexed within the hour – „public/gallery‟: use this to store your public visualisations – code sharing: Any .R script in your „public/‟ folder is source readable by the web
  • 26. Social Network Analysis Fridolin Wild, The Open University, UK
  • 28. The basic concept • Precursors date back to 1920s, math to Euler‟s „Seven Bridges of Koenigsberg‟
  • 29. The basic concept • Precursors date back to 1920s, math to Euler‟s „Seven Bridges of Koenigsberg‟
  • 30. The basic concept • Precursors date back to 1920s, math to Euler‟s „Seven Bridges of Koenigsberg‟ • Social Networks are: • Actors (people, groups, media, tags, …) • Ties (interactions, relationships, …) • Actors and ties form graph • Graph has measurable structural properties • Betweenness, • Degree of Centrality, • Density, • Cohesion • Structural Patterns
  • 31. Forum Messages message_id forum_id parent_id author 130 2853483 2853445 N 2043 131 1440740 785876 N 1669 132 2515257 2515256 N 5814 133 4704949 4699874 N 5810 134 2597170 2558273 N 2054 135 2316951 2230821 N 5095 136 3407573 3407568 N 36 137 2277393 2277387 N 359 138 3394136 3382201 N 1050 139 4603931 4167338 N 453 140 6234819 6189254 6231352 5400 141 806699 785877 804668 2177 142 4430290 3371246 3380313 48 143 3395686 3391024 3391129 35 144 6270213 6024351 6265378 5780 145 2496015 2491522 2491536 2774 146 4707562 4699873 4707502 5810 147 2574199 2440094 2443801 5801 148 4501993 4424215 4491650 5232 message_id forum_id parent_id author 60 734569 31117 N 2491 221 762702 31117 1 317 762717 31117 762702 1927 1528 819660 31117 793408 1197 1950 840406 31117 839998 1348 1047 841810 31117 767386 1879 2239 862709 31117 N 1982 2420 869839 31117 862709 2038 2694 884824 31117 N 5439 2503 896399 31117 862709 1982 2846 901691 31117 895022 992 3321 951376 31117 N 5174 3384 952895 31117 951376 1597 1186 955595 31117 767386 5724 3604 958065 31117 N 716 2551 960734 31117 862709 1939 4072 975816 31117 N 584 2574 986038 31117 862709 2043 2590 987842 31117 862709 1982
  • 32. Incidence Matrix • msg_id = incident, authors appear in incidents
  • 37. Network Density • Total edges = 29 • Possible edges = 18 * (18-1)/2 = 153 • Density = 0.19
  • 40. Tutorials • Starter: sna-simple.Rmd • Real: sna-blog.Rmd • Advanced: sna-forum.Rmd
  • 41. Latent Semantic Analysis Fridolin Wild, The Open University, UK
  • 42. Latent Semantic Analysis • “Humans learn word meanings and how to combine them into passage meaning through experience with ~paragraph unitized verbal environments.” • “They don‟t remember all the separate words of a passage; they remember its overall gist or meaning.” • “LSA learns by „reading‟ ~paragraph unitized texts that represent the environment.” • “It doesn‟t remember all the separate words of a text it; it remembers its overall gist or meaning.” (Landauer, 2007)
  • 43. Word choice is over-rated • Educated adult understands ~100,000 word forms • An average sentence contains 20 tokens. • Thus 100,00020 possible combinations of words in a sentence • maximum of log2 100,00020 = 332 bits in word choice alone. • 20! = 2.4 x 1018 possible orders of 20 words = maximum of 61 bits from order of the words. • 332/(61+ 332) = 84% word choice (Landauer, 2007)
  • 44. LSA (2) • Assumption: texts have a semantic structure • However, this structure is obscured by word usage (noise, synonymy, polysemy, …) • Proposed LSA Solution: – map doc-term matrix – using conceptual indices – derived statistically (truncated SVD) – and make similarity comparisons using angles
  • 45. Input (e.g., documents) { M } = Deerwester, Dumais, Furnas, Landauer, and Harshman (1990): Indexing by Latent Semantic Analysis, In: Journal of the American Society for Information Science, 41(6):391-407 Only the red terms appear in more than one document, so strip the rest. term = feature vocabulary = ordered set of features TEXTMATRIX
  • 48. Reconstructed, Reduced Matrix m4: Graph minors: A survey
  • 49. Similarity in a Latent-Semantic Space Query Target 1 Target 2Angle 2 Angle 1 Ydimension X dimension
  • 50. doc2doc - similarities Unreduced = pure vector space model - Based on M = TSD’ - Pearson Correlation over document vectors reduced - based on M2 = TS2D’ - Pearson Correlation over document vectors
  • 51. Ex Post Updating: Folding-In • SVD factor stability – SVD calculates factors over a given text base – Different texts – different factors – Challenge: avoid unwanted factor changes (e.g., bad essays) – Solution: folding-in of essays instead of recalculating • SVD is computationally expensive
  • 52. Folding-In in Detail 1 kk T i STvd 1 T ikki dSTm 2 vT Tk Sk Dk Mk (Berry et al., 1995) (1) convert Original Vector to „Dk“-format (2) convert „Dk“-format vector to „Mk“-format
  • 53. LSA Process & Driving Parameters 4 x 12 x 7 x 2 x 3 = 2016 Combinations
  • 54. Pre-Processing • Stemming – Porter Stemmer (snowball.tartarus.org) – ‚move„, ‚moving„, ‚moves„ => ‚move„ – in German even more important (more flections) • Stop Word Elimination – 373 Stop Words in German • Stemming plus Stop Word Elimination • Unprocessed („raw‟) Terms
  • 55. Term Weighting Schemes • Global Weights (GW) – None (‚raw‘ tf) – Normalisation – Inverse Document Frequency (IDF) – 1 + Entropy . 1 2 1 j ij i tf norm 1 )( log2 idocfreq numdocs idfi 1 log log 1 j ijij i numdocs pp entplusone 1 j ij ij ij tf tf p, where weightij = lw(tfij) ∙ gw(tfij)  Local Weights (LW)  None (‘raw’ tf)  Binary Term Frequency  Logarithmized Term Frequency (log)
  • 56. SVD-Dimensionality • Many different proposals (see package) • 80% variance is a good estimator
  • 57. Proximity Measures • Pearson Correlation • Cosine Correlation • Spearman„s Rho pics: http://davidmlane.com/hyperstat/A62891.html
  • 58. Pair-wise dis/similarity Convergence expected: ‘eu’, ‘österreich’ Divergence expected: ‘jahr’, ‘wien’
  • 59. The Package • Available via CRAN, e.g.: http://cran.r-project.org/web/packages/lsa/index.html • Higher-level Abstraction to Ease Use – Core methods: textmatrix() / query() lsa() fold_in() as.textmatrix() – Support methods for term weighting, dimensionality calculation, correlation measurement, …
  • 60. Core Workflow • tm = textmatrix(„dir/„) • tm = lw_logtf(tm) * gw_idf(tm) • space = lsa(tm, dims=dimcalc_share()) • tm3 = fold_in(tm, space) • as.textmatrix(tm)
  • 62. Tutorials • Starter: lsa-indexing.Rmd • Real: lsa-essayscoring.Rmd • Advanced: lsa-sparse.Rmd
  • 63. Additional tutorials Fridolin Wild, The Open University, UK
  • 64. Tutorials • Advanced I/O: twitter.Rmd • Advanced I/O: sparql.Rmd • Advanced NLP: twitter-sentiment.Rmd • Evaluation: interrater-agreement.Rmd