Language 
Technologies for 
Geomatics: From 
Intelligence to Agility 
Vision Géomatique - 2014-11-12 
Stéphane Gagnon, Ph.D. 
Professeur, DSA, UQO
Outline 
1. Business Intelligence 
2 Stéphane Gagnon 2014-11-12 
2. Language Technologies 
3. Geomatics Applications 
4. Big Data and Geo-Agility
Abstract 
3 Stéphane Gagnon 2014-11-12 
 Language Technologies are used for automated text 
analytics, and rely on a blend of Linguistics, Artificial 
Intelligence (AI), and Decision Sciences. 
 They include such applications as content 
management, document indexing and search, text 
classification, automated translation, geographic and 
contextual localization, semantic web, real-time text 
stream processing, event patterns analysis, and others. 
 We present a brief discussion of how Language 
Technologies may be integrated with geomatics 
applications, not simply to enhance business and 
decisional intelligence, but with the aim of making 
organizations more agile and resilient in the face of risk 
and uncertainty.
Sources 
4 Stéphane Gagnon 2014-11-12 
 Baccalauréat en administration - Systèmes d'information de gestion 
 SIG1003 - Systèmes d'information pour gestionnaires 
 Efraim Turban, Linda Volonino, Gregory Wood, et Janice Sipior, 
(2013), Information technology for management: Advancing 
sustainable, profitable business growth, 9th edition, New York, 
Wiley, 480 pages, ISBN: 9781118547861 
 SIG1043 - Intelligence d’affaires 
 Ramesh Sharda, Dursun Delen, Efraim Turban, (2013), Business 
Intelligence: A Managerial Perspective on Analytics, CourseSmart 
eTextbook, 3rd edition, New York, Pearson Higher Education, 330 
pages, ISBN: 9780133051070
5 2014-11-12 
Stéphane Gagnon 
1. Business Intelligence 
Stéphane Gagnon
Goals of BI 
6 Stéphane Gagnon 2014-11-12
BI Evolution 
7 Stéphane Gagnon 2014-11-12
8 Stéphane Gagnon 2014-11-12 
Modern BI Dashboard
9 Stéphane Gagnon 2014-11-12 
BI Project Lifecycle
10 Stéphane Gagnon 2014-11-12 
Typical BI Architecture 
Data 
Data Warehouse 
Environment 
ü Organizing Warehouse 
BPM strategy 
ü Summarizing 
ü Standardizing 
Technical staff 
Data 
Sources 
Business Analytics 
Environment 
Performance and 
Strategy 
Business users Managers / executives 
Built the data warehouse Access 
Manipulation 
Results 
Future component 
intelligent systems 
User Interface 
- browser 
- portal 
- dashboard
11 Stéphane Gagnon 2014-11-12 
BI Data Management
12 2014-11-12 
Stéphane Gagnon 
2. Language Technologies 
Stéphane Gagnon
13 Stéphane Gagnon 2014-11-12 
Data Mining (DM) 
Statistics 
Artificial Intelligence 
Pattern 
Recognition 
Machine 
Learning 
DATA 
MINING 
Databases 
Mathematical 
Modeling 
Management Science & 
Information Systems
DM Tasks 
14 Stéphane Gagnon 2014-11-12 
Data Mining 
Prediction 
Classification 
Regression 
Association 
Link analysis 
Sequence analysis 
Clustering 
Learning Method Popular Algorithms 
Supervised 
Supervised 
Supervised 
Unsupervised 
Unsupervised 
Unsupervised 
Unsupervised 
Classification and Regression Trees, 
ANN, SVM, Genetic Algorithms 
Decision trees, ANN/MLP, SVM, Rough 
sets, Genetic Algorithms 
Linear/Nonlinear Regression, Regression 
trees, ANN/MLP, SVM 
Apriory, OneR, ZeroR, Eclat 
Expectation Maximization, Apriory 
Algorithm, Graph-based Matching 
Apriory Algorithm, FP-Growth technique 
K-means, ANN/SOM 
Outlier analysis Unsupervised K-means, Expectation Maximization (EM)
15 Stéphane Gagnon 2014-11-12 
Language Technologies 
Statistical Methods 
 Analyze documents as bags of 
words 
Semantic Methods 
 Analyze documents using tags from 
ontologies describing relationships
16 Stéphane Gagnon 2014-11-12 
Statistical Methods 
Information retrieval/search 
Topic/keyword tracking 
Geo-language recognition 
Categorization/classification 
Clustering/recommendation 
Concept linking/association rules
Semantic Methods 
Natural Language Processing (NLP) 
 Part-of-speech tagging 
 Text segmentation 
Word sense disambiguation 
 Syntax ambiguity 
 Imperfect or irregular input 
 Speech acts 
17 Stéphane Gagnon 2014-11-12
NLP Tasks 
 Information extraction 
 Named-entity recognition 
 Question answering 
 Automatic summarization 
 Natural language generation & understanding 
 Machine translation 
 Foreign language reading & writing 
 Speech recognition 
 Text proofing, optical character recognition 
 Sentiment analysis 
18 Stéphane Gagnon 2014-11-12
19 Stéphane Gagnon 2014-11-12 
Text Mining (TM) Process 
Task 1 Task 2 Task 3 
Establish the Corpus: 
Collect & Organize the 
Domain Specific 
Unstructured Data 
Create the Term- 
Document Matrix: 
Introduce Structure 
to the Corpus 
Extract Knowledge: 
Discover Novel 
Patterns from the 
T-D Matrix 
The inputs to the process 
includes a variety of relevant 
unstructured (and semi-structured) 
data sources such 
as text, XML, HTML, etc. 
The output of the Task 1 is a 
collection of documents in 
some digitized format for 
computer processing 
The output of the Task 2 is a 
flat file called term-document 
matrix where the cells are 
populated with the term 
frequencies 
The output of Task 3 is a 
number of problem specific 
classification, association, 
clustering models and 
visualizations 
Feedback Feedback
20 Stéphane Gagnon 2014-11-12 
Enterprise Index/Search
Web Mining 
21 Stéphane Gagnon 2014-11-12 
Web 
Analytics 
Voice of 
Customer 
Customer Experience 
Management 
Customer Interaction 
on theWeb 
Analysis of Interactions Knowledge about the Holistic 
View of the Customer
IBM Watson QA 
22 Stéphane Gagnon 2014-11-12 
Trained 
models 
Question 
analysis 
Answer 
sources 
Hypothesis 
generation 
Query 
decomposition 
Soft 
filtering 
Evidence 
sources 
Hypothesis and 
evidence scoring 
Synthesis 
Final merging 
and ranking 
Answer and 
confidence 
Hypothesis 
generation 
Soft 
filtering 
Hypothesis and 
evidence scoring 
... ... ... 
Primary 
search 
Candidate 
answer 
generation 
Support 
evidence 
retrieval 
Deep 
evidence 
Question scoring 
1 
2 
3 
4 
5
TM for Lies 
23 Stéphane Gagnon 2014-11-12 
Statements 
Transcribed for 
Processing 
Text Processing 
Software Identified 
Cues in Statements 
Statements Labeled as 
Truthful or Deceptive 
By Law Enforcement 
Text Processing 
Software Generated 
Quantified Cues 
Classification Models 
Trained and Tested on 
Quantified Cues 
Cues Extracted & 
Selected
24 2014-11-12 
Stéphane Gagnon 
3. Geomatics Applications 
Stéphane Gagnon
Geo-Analytics 
25 Stéphane Gagnon 2014-11-12
26 Stéphane Gagnon 2014-11-12 
Geo-Textual Contextualization 
Geo-Information Sensors 
Software/hardware limitations 
Privacy issues 
Linguistic limitations 
Extract 
knowledge 
from available 
data sources 
A0 
Unstructured data (text) 
Structured data (databases) 
Context-specific knowledge 
Geo-Localized Contents 
Domain expertise 
Tools and techniques 
Geographic Information 
Geo-Intelligence Models
27 Stéphane Gagnon 2014-11-12 
Geo-Social Network Analysis
28 Stéphane Gagnon 2014-11-12 
Geo-Analytics of Voter Talk 
INPUT: Data Sources 
§ Census data 
Population specifics, age, 
race, sex, income, etc. 
§ Election Databases 
Party affiliations, previous 
election outcomes, trends 
and distributions 
§ Market research 
Polls, recent trends and 
movements 
§ Social media 
Facebook, Twitter, LinkedIn, 
Newsgroups, Blogs, etc. 
§ Web (in general) 
Web pages, posts and 
replies, search trends, etc. 
· Other data sources 
OUTPUT: Goals 
§ Raise money contributions 
§ Increase number of 
volunteers 
§ Organize movements 
§ Mobilize voters to get out 
and vote 
§ Other goals and objectives 
§ ... 
Big Data & Analytics 
(Data Mining, Web Mining, Text 
Mining, Multi-media Mining) 
§ Predicting outcomes and 
trends 
§ Identifying associations 
between events and 
outcomes 
§ Assessing and measuring 
the sentiments 
§ Profiling (clustering) groups 
with similar behavioral 
patterns 
§ Other knowledge nuggets
29 Stéphane Gagnon 2014-11-12 
Geo-Contextualized 
Text and Voice 
Messages
30 Stéphane Gagnon 2014-11-12 
Geo-Analytics of Test Reports
31 2014-11-12 
Stéphane Gagnon 
4. Big Data and Geo-Agility 
Stéphane Gagnon
32 Stéphane Gagnon 2014-11-12 
Competitive Advantage
33 Stéphane Gagnon 2014-11-12 
Pressures for Agility
BI and Agility 
34 Stéphane Gagnon 2014-11-12 
 Process efficiency and cost reduction 
 Brand management 
 Revenue maximization, cross-selling/up-selling 
 Enhanced customer experience 
 Churn identification, customer recruiting 
 Improved customer service 
 Identifying new products and market opportunities 
 Risk management 
 Regulatory compliance 
 Enhanced security capabilities
Big Data - Definition 
 Big Data means different things to people 
with different backgrounds and interests 
 Traditionally, “Big Data” = Giga and Tera 
 E.g., volume of data at CERN, NASA, Google, … 
 The Vs that define Big Data 
 Volume 
 Variety 
 Velocity 
 Veracity 
 Variability 
 Value 
35 Stéphane Gagnon 2014-11-12
36 2014-11-12 
Stéphane Gagnon 
Big Data Examples 
Data Sources 
 Web text documents 
 Multimedia annotations 
 Web logs 
 RFID 
 GPS systems 
 Sensor networks 
 Social networks 
 Internet search indexes 
 Detail call records 
Application Domains 
 Financial markets 
 Broadcasting 
 Biology and life sciences 
 Healthcare informatics 
 Transportation 
 Security and defense 
 Atmospheric science 
 Genomics and research 
 Energy and SCADA
37 Stéphane Gagnon 2014-11-12 
Big Data Architecture 
Marketing 
Applications 
Business 
Intelligence 
Data 
Mining 
Math 
and Stats 
Languages 
ANALYTIC 
Customers 
Partners 
Business 
Analysts 
Data 
Scientists 
TOOLS & APPS USERS 
UNIFIED DATA ARCHITECTURE 
MOVE MANAGE ACCESS 
INTEGRATED 
DATA WAREHOUSE 
DISCOVERY PLATFORM 
DATA 
PLATFORM 
System Conceptual View 
Marketing 
Executives 
Operational 
Systems 
Frontline 
Workers 
Engineers 
EVENT 
PROCESSING 
ERP 
ERP 
SCM 
CRM 
Images 
Audio 
and Video 
Machine 
Logs 
Text 
Web and 
Social 
BIG DATA 
SOURCES
38 Stéphane Gagnon 2014-11-12 
Big Data Requirements 
A Clear 
business need 
Keys to Success 
with Big Data 
Analytics 
Strong, 
committed 
sponsorship 
Alignment 
between the 
business and IT 
strategy 
A fact-based 
decision-making 
culture 
Personnel with 
advanced 
analytical skills 
A strong data 
infrastructure 
The right 
analytics tools
39 Stéphane Gagnon 2014-11-12 
Conclusion: Toward Geo-Agility 
 People-Centric: Track geo-information from key 
individuals and assets across/around the enterprise 
 Contextualize: Add geo-info to unstructured contents, 
use DM and TM with geo-analytics 
 Exploration: Link contextualized geo-info to real-time 
decision requirements 
 Open: Leverage open and mobile sources 
 Big Data: Make real-time streaming capabilities 
 Event-Driven: Develop organization agility and resilience, 
capability to automate adaptation 
 Emergent Strategies: Adapt business strategy along with 
evidence-based decision-making
Merci! 
Stéphane Gagnon, Ph.D. 
Professeur agrégé 
Département des sciences administratives 
Université du Québec en Outaouais 
Pavillon Lucien-Brault 
101, rue St-Jean-Bosco, Local A2228 
C.P. 1250, succursale Hull 
Gatineau (Québec) J8X 3X7 
Téléphone: 819-595-3900, poste 1942 
Télécopieur: 819-773-1747 
Courriel: stephane.gagnon@uqo.ca 
Web: http://www.gagnontech.org 
Skype: stephanegagnon 
40 Stéphane Gagnon 2014-11-12 
Crédits des photos: SJ: http://www.lgt.ws, AT et LB: http://www.flickr.com/photos/uqo/

Language Technologies for Geomatics: From Intelligence to Agility

  • 1.
    Language Technologies for Geomatics: From Intelligence to Agility Vision Géomatique - 2014-11-12 Stéphane Gagnon, Ph.D. Professeur, DSA, UQO
  • 2.
    Outline 1. BusinessIntelligence 2 Stéphane Gagnon 2014-11-12 2. Language Technologies 3. Geomatics Applications 4. Big Data and Geo-Agility
  • 3.
    Abstract 3 StéphaneGagnon 2014-11-12  Language Technologies are used for automated text analytics, and rely on a blend of Linguistics, Artificial Intelligence (AI), and Decision Sciences.  They include such applications as content management, document indexing and search, text classification, automated translation, geographic and contextual localization, semantic web, real-time text stream processing, event patterns analysis, and others.  We present a brief discussion of how Language Technologies may be integrated with geomatics applications, not simply to enhance business and decisional intelligence, but with the aim of making organizations more agile and resilient in the face of risk and uncertainty.
  • 4.
    Sources 4 StéphaneGagnon 2014-11-12  Baccalauréat en administration - Systèmes d'information de gestion  SIG1003 - Systèmes d'information pour gestionnaires  Efraim Turban, Linda Volonino, Gregory Wood, et Janice Sipior, (2013), Information technology for management: Advancing sustainable, profitable business growth, 9th edition, New York, Wiley, 480 pages, ISBN: 9781118547861  SIG1043 - Intelligence d’affaires  Ramesh Sharda, Dursun Delen, Efraim Turban, (2013), Business Intelligence: A Managerial Perspective on Analytics, CourseSmart eTextbook, 3rd edition, New York, Pearson Higher Education, 330 pages, ISBN: 9780133051070
  • 5.
    5 2014-11-12 StéphaneGagnon 1. Business Intelligence Stéphane Gagnon
  • 6.
    Goals of BI 6 Stéphane Gagnon 2014-11-12
  • 7.
    BI Evolution 7Stéphane Gagnon 2014-11-12
  • 8.
    8 Stéphane Gagnon2014-11-12 Modern BI Dashboard
  • 9.
    9 Stéphane Gagnon2014-11-12 BI Project Lifecycle
  • 10.
    10 Stéphane Gagnon2014-11-12 Typical BI Architecture Data Data Warehouse Environment ü Organizing Warehouse BPM strategy ü Summarizing ü Standardizing Technical staff Data Sources Business Analytics Environment Performance and Strategy Business users Managers / executives Built the data warehouse Access Manipulation Results Future component intelligent systems User Interface - browser - portal - dashboard
  • 11.
    11 Stéphane Gagnon2014-11-12 BI Data Management
  • 12.
    12 2014-11-12 StéphaneGagnon 2. Language Technologies Stéphane Gagnon
  • 13.
    13 Stéphane Gagnon2014-11-12 Data Mining (DM) Statistics Artificial Intelligence Pattern Recognition Machine Learning DATA MINING Databases Mathematical Modeling Management Science & Information Systems
  • 14.
    DM Tasks 14Stéphane Gagnon 2014-11-12 Data Mining Prediction Classification Regression Association Link analysis Sequence analysis Clustering Learning Method Popular Algorithms Supervised Supervised Supervised Unsupervised Unsupervised Unsupervised Unsupervised Classification and Regression Trees, ANN, SVM, Genetic Algorithms Decision trees, ANN/MLP, SVM, Rough sets, Genetic Algorithms Linear/Nonlinear Regression, Regression trees, ANN/MLP, SVM Apriory, OneR, ZeroR, Eclat Expectation Maximization, Apriory Algorithm, Graph-based Matching Apriory Algorithm, FP-Growth technique K-means, ANN/SOM Outlier analysis Unsupervised K-means, Expectation Maximization (EM)
  • 15.
    15 Stéphane Gagnon2014-11-12 Language Technologies Statistical Methods  Analyze documents as bags of words Semantic Methods  Analyze documents using tags from ontologies describing relationships
  • 16.
    16 Stéphane Gagnon2014-11-12 Statistical Methods Information retrieval/search Topic/keyword tracking Geo-language recognition Categorization/classification Clustering/recommendation Concept linking/association rules
  • 17.
    Semantic Methods NaturalLanguage Processing (NLP)  Part-of-speech tagging  Text segmentation Word sense disambiguation  Syntax ambiguity  Imperfect or irregular input  Speech acts 17 Stéphane Gagnon 2014-11-12
  • 18.
    NLP Tasks Information extraction  Named-entity recognition  Question answering  Automatic summarization  Natural language generation & understanding  Machine translation  Foreign language reading & writing  Speech recognition  Text proofing, optical character recognition  Sentiment analysis 18 Stéphane Gagnon 2014-11-12
  • 19.
    19 Stéphane Gagnon2014-11-12 Text Mining (TM) Process Task 1 Task 2 Task 3 Establish the Corpus: Collect & Organize the Domain Specific Unstructured Data Create the Term- Document Matrix: Introduce Structure to the Corpus Extract Knowledge: Discover Novel Patterns from the T-D Matrix The inputs to the process includes a variety of relevant unstructured (and semi-structured) data sources such as text, XML, HTML, etc. The output of the Task 1 is a collection of documents in some digitized format for computer processing The output of the Task 2 is a flat file called term-document matrix where the cells are populated with the term frequencies The output of Task 3 is a number of problem specific classification, association, clustering models and visualizations Feedback Feedback
  • 20.
    20 Stéphane Gagnon2014-11-12 Enterprise Index/Search
  • 21.
    Web Mining 21Stéphane Gagnon 2014-11-12 Web Analytics Voice of Customer Customer Experience Management Customer Interaction on theWeb Analysis of Interactions Knowledge about the Holistic View of the Customer
  • 22.
    IBM Watson QA 22 Stéphane Gagnon 2014-11-12 Trained models Question analysis Answer sources Hypothesis generation Query decomposition Soft filtering Evidence sources Hypothesis and evidence scoring Synthesis Final merging and ranking Answer and confidence Hypothesis generation Soft filtering Hypothesis and evidence scoring ... ... ... Primary search Candidate answer generation Support evidence retrieval Deep evidence Question scoring 1 2 3 4 5
  • 23.
    TM for Lies 23 Stéphane Gagnon 2014-11-12 Statements Transcribed for Processing Text Processing Software Identified Cues in Statements Statements Labeled as Truthful or Deceptive By Law Enforcement Text Processing Software Generated Quantified Cues Classification Models Trained and Tested on Quantified Cues Cues Extracted & Selected
  • 24.
    24 2014-11-12 StéphaneGagnon 3. Geomatics Applications Stéphane Gagnon
  • 25.
    Geo-Analytics 25 StéphaneGagnon 2014-11-12
  • 26.
    26 Stéphane Gagnon2014-11-12 Geo-Textual Contextualization Geo-Information Sensors Software/hardware limitations Privacy issues Linguistic limitations Extract knowledge from available data sources A0 Unstructured data (text) Structured data (databases) Context-specific knowledge Geo-Localized Contents Domain expertise Tools and techniques Geographic Information Geo-Intelligence Models
  • 27.
    27 Stéphane Gagnon2014-11-12 Geo-Social Network Analysis
  • 28.
    28 Stéphane Gagnon2014-11-12 Geo-Analytics of Voter Talk INPUT: Data Sources § Census data Population specifics, age, race, sex, income, etc. § Election Databases Party affiliations, previous election outcomes, trends and distributions § Market research Polls, recent trends and movements § Social media Facebook, Twitter, LinkedIn, Newsgroups, Blogs, etc. § Web (in general) Web pages, posts and replies, search trends, etc. · Other data sources OUTPUT: Goals § Raise money contributions § Increase number of volunteers § Organize movements § Mobilize voters to get out and vote § Other goals and objectives § ... Big Data & Analytics (Data Mining, Web Mining, Text Mining, Multi-media Mining) § Predicting outcomes and trends § Identifying associations between events and outcomes § Assessing and measuring the sentiments § Profiling (clustering) groups with similar behavioral patterns § Other knowledge nuggets
  • 29.
    29 Stéphane Gagnon2014-11-12 Geo-Contextualized Text and Voice Messages
  • 30.
    30 Stéphane Gagnon2014-11-12 Geo-Analytics of Test Reports
  • 31.
    31 2014-11-12 StéphaneGagnon 4. Big Data and Geo-Agility Stéphane Gagnon
  • 32.
    32 Stéphane Gagnon2014-11-12 Competitive Advantage
  • 33.
    33 Stéphane Gagnon2014-11-12 Pressures for Agility
  • 34.
    BI and Agility 34 Stéphane Gagnon 2014-11-12  Process efficiency and cost reduction  Brand management  Revenue maximization, cross-selling/up-selling  Enhanced customer experience  Churn identification, customer recruiting  Improved customer service  Identifying new products and market opportunities  Risk management  Regulatory compliance  Enhanced security capabilities
  • 35.
    Big Data -Definition  Big Data means different things to people with different backgrounds and interests  Traditionally, “Big Data” = Giga and Tera  E.g., volume of data at CERN, NASA, Google, …  The Vs that define Big Data  Volume  Variety  Velocity  Veracity  Variability  Value 35 Stéphane Gagnon 2014-11-12
  • 36.
    36 2014-11-12 StéphaneGagnon Big Data Examples Data Sources  Web text documents  Multimedia annotations  Web logs  RFID  GPS systems  Sensor networks  Social networks  Internet search indexes  Detail call records Application Domains  Financial markets  Broadcasting  Biology and life sciences  Healthcare informatics  Transportation  Security and defense  Atmospheric science  Genomics and research  Energy and SCADA
  • 37.
    37 Stéphane Gagnon2014-11-12 Big Data Architecture Marketing Applications Business Intelligence Data Mining Math and Stats Languages ANALYTIC Customers Partners Business Analysts Data Scientists TOOLS & APPS USERS UNIFIED DATA ARCHITECTURE MOVE MANAGE ACCESS INTEGRATED DATA WAREHOUSE DISCOVERY PLATFORM DATA PLATFORM System Conceptual View Marketing Executives Operational Systems Frontline Workers Engineers EVENT PROCESSING ERP ERP SCM CRM Images Audio and Video Machine Logs Text Web and Social BIG DATA SOURCES
  • 38.
    38 Stéphane Gagnon2014-11-12 Big Data Requirements A Clear business need Keys to Success with Big Data Analytics Strong, committed sponsorship Alignment between the business and IT strategy A fact-based decision-making culture Personnel with advanced analytical skills A strong data infrastructure The right analytics tools
  • 39.
    39 Stéphane Gagnon2014-11-12 Conclusion: Toward Geo-Agility  People-Centric: Track geo-information from key individuals and assets across/around the enterprise  Contextualize: Add geo-info to unstructured contents, use DM and TM with geo-analytics  Exploration: Link contextualized geo-info to real-time decision requirements  Open: Leverage open and mobile sources  Big Data: Make real-time streaming capabilities  Event-Driven: Develop organization agility and resilience, capability to automate adaptation  Emergent Strategies: Adapt business strategy along with evidence-based decision-making
  • 40.
    Merci! Stéphane Gagnon,Ph.D. Professeur agrégé Département des sciences administratives Université du Québec en Outaouais Pavillon Lucien-Brault 101, rue St-Jean-Bosco, Local A2228 C.P. 1250, succursale Hull Gatineau (Québec) J8X 3X7 Téléphone: 819-595-3900, poste 1942 Télécopieur: 819-773-1747 Courriel: stephane.gagnon@uqo.ca Web: http://www.gagnontech.org Skype: stephanegagnon 40 Stéphane Gagnon 2014-11-12 Crédits des photos: SJ: http://www.lgt.ws, AT et LB: http://www.flickr.com/photos/uqo/