SlideShare a Scribd company logo
28.06.2013 DIMA – TU Berlin 1
Fachgebiet Datenbanksysteme und Informationsmanagement
Technische Universität Berlin
http://www.dima.tu-berlin.de/
Automated Construction of a Large Semantic Network
of Related Terms for Domain-Specific Modeling
CAiSE 2013, June 21st, Valencia
Henning Agt and Ralf-Detlef Kutsche
Technische Universität Berlin
28.06.2013 DIMA – TU Berlin 2
■ Autocompletion applications
■ Predict what the user wants to model next
Motivation
nurse
treatment
medicine
emergency
...
28.06.2013 DIMA – TU Berlin 3
■ Our Vision: Provide automated suggestions of semantically related
model elements for domain modeling [5],[19]
□ Focus on domain terminology and conceptual design
□ Query domain and common sense ontologies
□ Information extraction from text
■ Requirements for the intended application
□ Dictionary of terms
□ Relations between terms
□ Query interface and ranking functions
Research Goals
nurse
treatment
medicine
emergency
...
OntoOntoOnto‐
logies
Extract
Modeling
Tools
Knowledge
Service
Query
Text
Analysis
OntoOntoTermi‐
nology
Retrieve/
Integrate
Generate
Provide
Suggestions
Use
28.06.2013 DIMA – TU Berlin 4
■ Input dataset
■ Text analysis process
■ Application of SemNet
■ Evaluation of SemNet
■ Conclusions and Future Work
Agenda
N‐Gram
Statistics
Text 
Corpus
N‐Gram 
DB
POS
DB
Norm.
N‐Gram 
DB
Analyse Parse
Normalize
Tag
SemNet
Analyse
Co‐occurrences
Applications
Retrieve
Query
28.06.2013 DIMA – TU Berlin 5
■ Input dataset
■ Text analysis process
■ Application of SemNet
■ Evaluation of SemNet
■ Conclusions and Future Work
Agenda
N‐Gram
Statistics
Text 
Corpus
N‐Gram 
DB
POS
DB
Norm.
N‐Gram 
DB
Analyse Parse
Normalize
Tag
SemNet
Analyse
Co‐occurrences
Applications
Retrieve
Query
28.06.2013 DIMA – TU Berlin 6
■ Large amounts of text data
■ N-Grams
□ Sequence of n consecutive words/tokens and its frequency
□ Google provides 1,2,3,4 and 5-grams in several languages
■ We work on the English-All dataset V2 (1-grams and 5-grams) [11]
Google Books N-Gram Dataset
5 million books
Corpus
500 billion words N‐gram analysis
N‐Gram
Dataset
CSV text files
with word frequencies
...
…
to go to the hospital 46,410
general condition of the patient 28,198
I was in the hospital 19,268
discharge from the hospital . 12,476
admission to the hospital . 10,558
the patient to the hospital 6,422
by placing the patient in 6,026
between doctor and patient . 5,908
... ...
…
able to leave the hospital 4,629
patient admitted to the hospital 4,303
a patient in the hospital 3,844
the symptom of the patient 2,559
the patient under local anesthesia 2,536
a patient is suffering from 2,475
the doctor and the hospital 1,362
the hospital and the doctor 1,017
...
28.06.2013 DIMA – TU Berlin 7
■ Input dataset
■ Text analysis process
■ Application of SemNet
■ Evaluation of SemNet
■ Conclusions and Future Work
Agenda
N‐Gram
Statistics
Text 
Corpus
N‐Gram 
DB
POS
DB
Norm.
N‐Gram 
DB
Analyse Parse
Normalize
Tag
SemNet
Analyse
Co‐occurrences
Applications
Retrieve
Query
28.06.2013 DIMA – TU Berlin 8
■ N-gram database
 Make the data manageable
□ Input: 2.5 terabytes of text
□ Output: Tables with
10 million 1-grams and
710 million 5-grams (21 gigabytes)
■ Part-of-speech tagging [8], [9]
 Identify lexical category of each text token
□ Output: Table with POS tags for each
5-gram (14 gigabytes)
■ Normalization
 Reduce amount of word variations
□ Plural stemming, lowercasing of
adjectives and normal nouns
□ Proper nouns are not touched
■ Result: 710 million normalized and tagged 5-grams
Preprocessing
JJ    NN  IN  DT   NN
general condition of the patient
NN   NN NN CC   NN
drug store pharmacist or doctor
doctors  doctor
Medical practitioner  medical practitioner
hospitals in Valencia  hospital in Valencia
Adjective
Normal
Noun DeterminerPreposition
CoordinatingCoordinating
conjunction
28.06.2013 DIMA – TU Berlin 9
■ Input dataset
■ Text analysis process
■ Application of SemNet
■ Evaluation of SemNet
■ Conclusions and Future Work
Agenda
N‐Gram
Statistics
Text 
Corpus
N‐Gram 
DB
POS
DB
Norm.
N‐Gram 
DB
Analyse Parse
Normalize
Tag
SemNet
Analyse
Co‐occurrences
Applications
Retrieve
Query
28.06.2013 DIMA – TU Berlin 10
■ Goal: Detect domain terminology using syntactical patterns [12]
■ Analysis of existing dictionaries
□ 75% of terms: noun, noun-noun, adjective noun combinations
■ Excerpt of the 20 patterns used:
■ No proper nouns: Stanford University / university professor
□ Our focus is conceptual design on schema level
■ Limitation: 5-gram: 5 words
□ Maximum length of a term: 3 words
Lexical Patterns
doctor or mental health professional
term termseparation
28.06.2013 DIMA – TU Berlin 11
■ Hierarchical pattern matching
■ Distributional Semantics [13], [22]
□ “Words that occur in the same contexts
tend to have similar meanings.”
(Distributional Hypothesis by Z. Harris)
Co-Occurring Terms
your doctor or pharmacist .      9271
Context
frequency
Absolute 
frequency
„doctor“ and „pharmacist“
co‐occurred 9271 times
Highest level remains
No idiomatic phrasesNo consecutive patterns
Easiest case
28.06.2013 DIMA – TU Berlin 12
■ Discard 5-grams that contain 4 or 5 stopwords
■ Apply pattern matching on the remaining 5-grams
 Result: Large table of binary relations
■ Frequency aggregation
□ Many terms co-occurred in different contexts
■ Relative frequency computation
□ For each term with respect to its related terms
■ Graph construction
□ Directed, weighted edges
□ Relational database and graph
database serialization (SQLite / Neo4J)
SemNet Construction
to go to the doctor I am what I am a ) ( 2 )
28.06.2013 DIMA – TU Berlin 13
■ Properties of SemNet
□ 268,937 distinct single-word terms
□ 2,115,494 distinct double-word terms
□ 355,689 distinct triple-word terms
□  2.7 million terms and 37.5 million relations
□ 2.2 GB disc space
■ Lessons learned from the analysis process
Statistics
41,6%
15,7%
32,6%
10,1%
4 or 5
stopwords
N-Gram Information Content
Only
1 term
No pattern
match
N-grams
with a
semantic
relationship
Semantic relatedness: Zipf‘s law
Rank
Degreeofrelatedness
28.06.2013 DIMA – TU Berlin 14
■ Input dataset
■ Text analysis process
■ Application of SemNet
■ Evaluation of SemNet
■ Conclusions and Future Work
Agenda
N‐Gram
Statistics
Text 
Corpus
N‐Gram 
DB
POS
DB
Norm.
N‐Gram 
DB
Analyse Parse
Normalize
Tag
SemNet
Analyse
Co‐occurrences
Applications
Retrieve
Query
28.06.2013 DIMA – TU Berlin 15
■ Query Interfaces
□ SQL: Query the relational database
□ Cypher: Query the Neo4J database
□ Java: Use SemNet in your applications
□ PHP: Explore the data in a web interface
■ Examples of top 10 automatically identified related terms
Querying SemNet
(f – absolute term frequency in the original text corpus, #r – number of related terms)
select * from nouncooccurrences where termw1 = 
5824331 and termw2 is null and termw3 is null
order by relfreq desc limit 20;
public ArrayList<String>
getRelatedStringTerms(ArrayList<String>
inputTerms) { … }
28.06.2013 DIMA – TU Berlin 16
■ Challenge: Methods based matrices and vectors are too slow
■ Strategy: Related term sets intersection + relative frequency
multiplication
Ranking Results of Multiple Input Terms
chair 0.0441
contents 0.0359
end 0.0221
front 0.0194
figure 0.0189
head 0.0189
side 0.0180
data 0.0157
hand 0.0132
column 0.0131
page 0.0118
edge 0.0112
result 0.0100
value 0.0099
place 0.0087
row 0.0086
show 0.0082
elbow 0.0072
list 0.0071
bed 0.0071
table
transaction
data 0.0735
information 0.0569
record 0.0376
table 0.0334
access 0.0310
spreadsheet 0.0252
name 0.0201
object 0.0164
retrieval system 0.0163
file 0.0158
example 0.0153
use 0.0150
connection 0.0146
structure 0.0139
field 0.0125
user 0.0124
change 0.0112
type 0.0107
size 0.0104
transaction 0.0102
database
… …
data 0.001155
contents 0.000359
information 0.000190
record 0.000091
use 0.000077
end 0.000060
example 0.000055
name 0.000050
figure 0.000047
value 0.000045
result 0.000037
list 0.000037
column 0.000034
row 0.000033
object 0.000024
field 0.000023
book 0.000016
order 0.000016
size 0.000014
query 0.000012
table+database
…
∩
*
28.06.2013 DIMA – TU Berlin 17
■ Prototype: Ecore Diagram Editor with class name suggestions [15]
■ Automated suggestion adaption with respect to the content of the model
Modeling With Semantic Autocompletion
28.06.2013 DIMA – TU Berlin 18
■ Input dataset
■ Text analysis process
■ Application of SemNet
■ Evaluation of SemNet
■ Conclusions and Future Work
Agenda
N‐Gram
Statistics
Text 
Corpus
N‐Gram 
DB
POS
DB
Norm.
N‐Gram 
DB
Analyse Parse
Normalize
Tag
SemNet
Analyse
Co‐occurrences
Applications
Retrieve
Query
28.06.2013 DIMA – TU Berlin 19
■ Challenge
□ No gold standard available for many information extraction tasks
■ Our strategy: Compare SemNet to existing knowledge bases
□ Provide measurements on how much information of WordNet and ConceptNet is
contained in SemNet
■ WordNet V3.0: Lexical database for the English language [16]
□ Synsets: Grouped terms that share the same sense
□ Relations: Mainly taxonomic, part-whole and synonyms
■ ConceptNet V5.1: Semantic graph for general human knowledge [17]
□ Nodes: Any natural language phrase that expresses a concept
□ Relations: Taxonomic, part-whole, related-to and several others
■ SemNet: Semantic Network of Related Terms
□ Nodes: Noun terminology
□ Relations: Probabilistic links
Evaluation Setup
maternity
morning
sickness
physical
condition
ectopic
pregnancy
entopic
pregnancy
synonym
part
meronym
parturiency
hyponym
hypernym
pregnancy
Conceptually
RelatedTo
pregnancy
expect
morning
sickness
physical
condition
go to bed
ectopic
pregnancy
PartOf
stretch
IsAIsA
Related
To
Causes
start
family
HasSubevent
mother
termination birth
woman
trimester
stage
weekchildbirth
lactation
month1
2
3 4
5
6
7
89
10
0.036
0.031
0.030 0.030
0.026
0.025
0.020
0.018
0.017
0.016
pregnancy
Word sense pregnancy in WordNet
(7 out of 32 relations)
Concept pregnancy in ConceptNet
(7 out of 58 relations).
Term pregnancy in SemNet
(First 10 out of 4039 relations).
S
W C
28.06.2013 DIMA – TU Berlin 20
■ WordNet
□ Iterate through all noun synsets
(72,994 synsets evaluated)
□ Check whether the nouns are
contained in SemNet
(98,681 nouns evaluated)
Results: 77,16% of WordNet‘s synsets are contained in SemNet and
62,17% of WordNet‘s nouns are contained in SemNet
■ ConceptNet
□ Problem: Concepts can be expressed
using any natural language phrase
□ First determine noun terminology
□ Check whether the nouns are
contained in SemNet
(49,301 concepts evaluated)
 Result: 82,40% of ConceptNet‘s nouns are contained in SemNet
Noun terminology coverage
(doctor, doc, physician, MD, Dr., medico)
(ear doctor, ear specialist, otologist)
(sleep talking, somniloquy, somniloquism)
doctor
go to bed 
pregnancy
beautiful
28.06.2013 DIMA – TU Berlin 21
■ WordNet / ConceptNet
□ Iterate through all previously found
noun synsets (56,321 synsets used)
and concepts (40,625 concepts used)
□ Check whether the relations between
synsets are contained in SemNet
(61,931 WordNet relations evaluated and
256,213 ConceptNet relations evaluated)
■ Relation evaluation results
Relation coverage
(doctor, doc, physician, MD, Dr., medico)
(medical practitioner, medical man)
hypernym
(surgeon)(allergist)
hyponym
28.06.2013 DIMA – TU Berlin 22
■ Input dataset
■ Text analysis process
■ Application of SemNet
■ Evaluation of SemNet
■ Conclusions and Future Work
Agenda
N‐Gram
Statistics
Text 
Corpus
N‐Gram 
DB
POS
DB
Norm.
N‐Gram 
DB
Analyse Parse
Normalize
Tag
SemNet
Analyse
Co‐occurrences
Applications
Retrieve
Query
28.06.2013 DIMA – TU Berlin 23
■ Summary
□ Input: 710 million 5-grams and 20 part-of-speech patterns
□ Hierarchical pattern matching, distributional semantics
□ Output: 2.7M multi-word terms and 37.5M weighted relations
□ Only a window of 5 words can be analyzed to detect relations
□ Applications: Domain-specific modeling, keyword expansion,
background knowledge for NLP tasks
■ Current and future work
□ Support additional languages
□ Improve ranking functions (pointwise mutual information)
□ Relax 3-word-limitation, derive own n-gram datasets
□ Combine probabilistic information with specific relations
□ Domain clustering in the semantic network
□ Additional modeling support: relations/associations, attributes
Conclusions and Future Work
28.06.2013 DIMA – TU Berlin 24
[5] H. Agt: Supporting Software Language Engineering by Automated
Domain Knowledge Acquisition. In: MODELS 2011 Workshops
LNCS 7167 Springer 2012
[8] Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-Rich
Part-of-Speech Tagging with a Cyclic Dependency Network. In:
Proceedings of the NAACL 2003, pp. 173–180.
[9] Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a Large
Annotated Corpus of English: The Penn Treebank. Computational
Linguistics 19(2), 313–330 (1993)
[11] Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team,
T.G.B., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant,
J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative Analysis of
Culture Using Millions of Digitized Books. Science 331(6014),
176–182 (2011)
[12] Hearst, M.A.: Automatic acquisition of hyponyms from large text
corpora. In: Proceedings of the 14th Conference on
Computational Linguistics, COLING 1992, vol. 2 (1992)
[13] Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
[15] Agt, H.: SemAcom: A System for Modeling with Semantic
Autocompletion. In: Model Driven Engineering Languages and
Systems - 15th International Conference, MODELS 2012, Demo
Track, Innsbruck, Austria (2012)
[16] Fellbaum, C.: WordNet: An Electronic Lexical Database. The MIT
Press, Cambridge (1998)
[17] Speer, R., Havasi, C.: Representing General Relational Knowledge
in ConceptNet 5. In: LREC 2012
[19] Agt, H., Kutsche, R.D., Wegeler, T.: Guidance for Domain Specific
Modeling in Small and Medium Enterprises. In: SPLASH 2011
Workshops. DSM 2011, Portland, OR, USA (2011)
[22] Turney, P.D., Pantel, P.: From frequency to meaning: vector
space models of semantics. J. Artif. Int. Res. 37(1), 141–188
(2010)
Thank You For Your Attention!
MODELS?
Try out SemNet: http://www.bizware.tu‐berlin.de/semnet/
Contact: henning.agt@tu‐berlin.de

More Related Content

What's hot

eNanoMapper database, search tools and templates
eNanoMapper database, search tools and templateseNanoMapper database, search tools and templates
eNanoMapper database, search tools and templates
Nina Jeliazkova
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Futurefeiwin
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Positional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesPositional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted Indexes
Leonidas Akritidis
 
8. Graph - Data Structures using C++ by Varsha Patil
8. Graph - Data Structures using C++ by Varsha Patil8. Graph - Data Structures using C++ by Varsha Patil
8. Graph - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
ComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and Graphs
Accumulo Summit
 
E mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCTE mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCT
dinesh2vasu
 
First steps in Data Mining Kindergarten
First steps in Data Mining KindergartenFirst steps in Data Mining Kindergarten
First steps in Data Mining Kindergarten
Alexey Zinoviev
 
Data wrangling week 11
Data wrangling week 11Data wrangling week 11
Data wrangling week 11
Ferdin Joe John Joseph PhD
 
Redis project : Relational Databases to Key-Value systems
Redis project : Relational Databases to Key-Value systemsRedis project : Relational Databases to Key-Value systems
Redis project : Relational Databases to Key-Value systems
Lamprini Koutsokera
 
Data wrangling week 6
Data wrangling week 6Data wrangling week 6
Data wrangling week 6
Ferdin Joe John Joseph PhD
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 abhagathk
 
RUGCombine & Livetrix
RUGCombine & LivetrixRUGCombine & Livetrix
RUGCombine & Livetrix
Try PurpleSearch
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype Properties
Besnik Fetahu
 
Machine learning
Machine learningMachine learning
Machine learning
InfoFarm
 
IR tutorial
IR tutorialIR tutorial
IR tutorial
Hussein Hazimeh
 

What's hot (20)

eNanoMapper database, search tools and templates
eNanoMapper database, search tools and templateseNanoMapper database, search tools and templates
eNanoMapper database, search tools and templates
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
 
20090813MEETING
20090813MEETING20090813MEETING
20090813MEETING
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Positional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesPositional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted Indexes
 
8. Graph - Data Structures using C++ by Varsha Patil
8. Graph - Data Structures using C++ by Varsha Patil8. Graph - Data Structures using C++ by Varsha Patil
8. Graph - Data Structures using C++ by Varsha Patil
 
ComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and Graphs
 
E mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCTE mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCT
 
First steps in Data Mining Kindergarten
First steps in Data Mining KindergartenFirst steps in Data Mining Kindergarten
First steps in Data Mining Kindergarten
 
Data wrangling week 11
Data wrangling week 11Data wrangling week 11
Data wrangling week 11
 
Redis project : Relational Databases to Key-Value systems
Redis project : Relational Databases to Key-Value systemsRedis project : Relational Databases to Key-Value systems
Redis project : Relational Databases to Key-Value systems
 
Data wrangling week 6
Data wrangling week 6Data wrangling week 6
Data wrangling week 6
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
RUGCombine & Livetrix
RUGCombine & LivetrixRUGCombine & Livetrix
RUGCombine & Livetrix
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype Properties
 
Machine learning
Machine learningMachine learning
Machine learning
 
IR tutorial
IR tutorialIR tutorial
IR tutorial
 

Viewers also liked

Sonja kabicher fuchs presentation-caise13_final
Sonja kabicher fuchs presentation-caise13_finalSonja kabicher fuchs presentation-caise13_final
Sonja kabicher fuchs presentation-caise13_finalcaise2013vlc
 
Markus keuneke partial data-models
Markus keuneke   partial data-modelsMarkus keuneke   partial data-models
Markus keuneke partial data-modelscaise2013vlc
 
Razvan petrusel presentation caise 2013
Razvan petrusel   presentation caise 2013Razvan petrusel   presentation caise 2013
Razvan petrusel presentation caise 2013caise2013vlc
 
Maurino andrea coopetitivecaise2013
Maurino andrea   coopetitivecaise2013Maurino andrea   coopetitivecaise2013
Maurino andrea coopetitivecaise2013caise2013vlc
 
David aguilera presentation
David aguilera   presentationDavid aguilera   presentation
David aguilera presentationcaise2013vlc
 
Malinda scalability c_ai_se_2013_v3
Malinda scalability c_ai_se_2013_v3Malinda scalability c_ai_se_2013_v3
Malinda scalability c_ai_se_2013_v3caise2013vlc
 

Viewers also liked (8)

Sonja kabicher fuchs presentation-caise13_final
Sonja kabicher fuchs presentation-caise13_finalSonja kabicher fuchs presentation-caise13_final
Sonja kabicher fuchs presentation-caise13_final
 
Markus keuneke partial data-models
Markus keuneke   partial data-modelsMarkus keuneke   partial data-models
Markus keuneke partial data-models
 
Razvan petrusel presentation caise 2013
Razvan petrusel   presentation caise 2013Razvan petrusel   presentation caise 2013
Razvan petrusel presentation caise 2013
 
Maurino andrea coopetitivecaise2013
Maurino andrea   coopetitivecaise2013Maurino andrea   coopetitivecaise2013
Maurino andrea coopetitivecaise2013
 
David aguilera presentation
David aguilera   presentationDavid aguilera   presentation
David aguilera presentation
 
Caise panel
Caise panelCaise panel
Caise panel
 
Malinda scalability c_ai_se_2013_v3
Malinda scalability c_ai_se_2013_v3Malinda scalability c_ai_se_2013_v3
Malinda scalability c_ai_se_2013_v3
 
Abbasi et al
Abbasi et alAbbasi et al
Abbasi et al
 

Similar to Henning agt talk-caise-semnet

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Paolo Missier
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Data science guide
Data science guideData science guide
Data science guide
gokulprasath06
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
FogGuru MSCA Project
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
eXascale Infolab
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
Smart mrs bi project-presentation
Smart mrs bi project-presentationSmart mrs bi project-presentation
Smart mrs bi project-presentation
Vimukthi Wickramasinghe
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
Chimezie Ogbuji
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
AlgoAnalytics Financial Consultancy Pvt. Ltd.
 
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Faculty of Computer Science - Free University of Bozen-Bolzano
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
DataRobot
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Robert Grossman
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
Rim Moussa
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
E05312426
E05312426E05312426
E05312426
IOSR-JEN
 
C2_W1---.pdf
C2_W1---.pdfC2_W1---.pdf
C2_W1---.pdf
Humayun Kabir
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
iamultapromax
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET Journal
 

Similar to Henning agt talk-caise-semnet (20)

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Data science guide
Data science guideData science guide
Data science guide
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Smart mrs bi project-presentation
Smart mrs bi project-presentationSmart mrs bi project-presentation
Smart mrs bi project-presentation
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
Dagstuhl 2013 - Montali - On the Relationship between OBDA and Relational Map...
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
E05312426
E05312426E05312426
E05312426
 
C2_W1---.pdf
C2_W1---.pdfC2_W1---.pdf
C2_W1---.pdf
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 

More from caise2013vlc

Jelena zdravkovic c ai-se 2013 capability caas
Jelena zdravkovic  c ai-se 2013 capability caasJelena zdravkovic  c ai-se 2013 capability caas
Jelena zdravkovic c ai-se 2013 capability caascaise2013vlc
 
Sagar sen caise2013final
Sagar sen caise2013finalSagar sen caise2013final
Sagar sen caise2013finalcaise2013vlc
 
Suriadi caise2013 slides
Suriadi caise2013 slidesSuriadi caise2013 slides
Suriadi caise2013 slidescaise2013vlc
 
Fadila caise2013 vf
Fadila caise2013 vfFadila caise2013 vf
Fadila caise2013 vfcaise2013vlc
 
Michael mrissa c aise
Michael mrissa c aiseMichael mrissa c aise
Michael mrissa c aisecaise2013vlc
 
Razvan petrusel presentation caise 2013
Razvan petrusel   presentation caise 2013Razvan petrusel   presentation caise 2013
Razvan petrusel presentation caise 2013caise2013vlc
 
Ramezani taghiabadi temporal compliance checking 2
Ramezani taghiabadi   temporal compliance checking 2Ramezani taghiabadi   temporal compliance checking 2
Ramezani taghiabadi temporal compliance checking 2caise2013vlc
 
Ferreira c ai-se2013-final-handouts
Ferreira   c ai-se2013-final-handoutsFerreira   c ai-se2013-final-handouts
Ferreira c ai-se2013-final-handoutscaise2013vlc
 
Sonja meyer caise 2013
Sonja meyer caise 2013Sonja meyer caise 2013
Sonja meyer caise 2013caise2013vlc
 
Tony clark caise 13-presentation
Tony clark  caise 13-presentationTony clark  caise 13-presentation
Tony clark caise 13-presentationcaise2013vlc
 
Miguel goulao 2013 c-aise
Miguel goulao 2013 c-aiseMiguel goulao 2013 c-aise
Miguel goulao 2013 c-aisecaise2013vlc
 
Jorge cardoso caise-usdl-tosca-2013-06-18c
Jorge cardoso   caise-usdl-tosca-2013-06-18cJorge cardoso   caise-usdl-tosca-2013-06-18c
Jorge cardoso caise-usdl-tosca-2013-06-18ccaise2013vlc
 
Kerrstin klemishc c-aise2013_
Kerrstin klemishc c-aise2013_Kerrstin klemishc c-aise2013_
Kerrstin klemishc c-aise2013_caise2013vlc
 
Ignacio panach ormeño et-al_caise2013
Ignacio panach   ormeño et-al_caise2013Ignacio panach   ormeño et-al_caise2013
Ignacio panach ormeño et-al_caise2013caise2013vlc
 
Peter sawyer caise
Peter sawyer  caisePeter sawyer  caise
Peter sawyer caisecaise2013vlc
 
Moe wynn caise13 presentation
Moe wynn   caise13 presentationMoe wynn   caise13 presentation
Moe wynn caise13 presentationcaise2013vlc
 
Tommi kramer 2013-06-21-caise-re2-kramer
Tommi kramer   2013-06-21-caise-re2-kramerTommi kramer   2013-06-21-caise-re2-kramer
Tommi kramer 2013-06-21-caise-re2-kramercaise2013vlc
 
Canovas cabot topublish-caise2013-
Canovas cabot topublish-caise2013-Canovas cabot topublish-caise2013-
Canovas cabot topublish-caise2013-caise2013vlc
 

More from caise2013vlc (20)

Jelena zdravkovic c ai-se 2013 capability caas
Jelena zdravkovic  c ai-se 2013 capability caasJelena zdravkovic  c ai-se 2013 capability caas
Jelena zdravkovic c ai-se 2013 capability caas
 
Sagar sen caise2013final
Sagar sen caise2013finalSagar sen caise2013final
Sagar sen caise2013final
 
Suriadi caise2013 slides
Suriadi caise2013 slidesSuriadi caise2013 slides
Suriadi caise2013 slides
 
Fadila caise2013 vf
Fadila caise2013 vfFadila caise2013 vf
Fadila caise2013 vf
 
Michael mrissa c aise
Michael mrissa c aiseMichael mrissa c aise
Michael mrissa c aise
 
Razvan petrusel presentation caise 2013
Razvan petrusel   presentation caise 2013Razvan petrusel   presentation caise 2013
Razvan petrusel presentation caise 2013
 
Ramezani taghiabadi temporal compliance checking 2
Ramezani taghiabadi   temporal compliance checking 2Ramezani taghiabadi   temporal compliance checking 2
Ramezani taghiabadi temporal compliance checking 2
 
Ferreira c ai-se2013-final-handouts
Ferreira   c ai-se2013-final-handoutsFerreira   c ai-se2013-final-handouts
Ferreira c ai-se2013-final-handouts
 
Sonja meyer caise 2013
Sonja meyer caise 2013Sonja meyer caise 2013
Sonja meyer caise 2013
 
Tony clark caise 13-presentation
Tony clark  caise 13-presentationTony clark  caise 13-presentation
Tony clark caise 13-presentation
 
Miguel goulao 2013 c-aise
Miguel goulao 2013 c-aiseMiguel goulao 2013 c-aise
Miguel goulao 2013 c-aise
 
Jorge cardoso caise-usdl-tosca-2013-06-18c
Jorge cardoso   caise-usdl-tosca-2013-06-18cJorge cardoso   caise-usdl-tosca-2013-06-18c
Jorge cardoso caise-usdl-tosca-2013-06-18c
 
Kerrstin klemishc c-aise2013_
Kerrstin klemishc c-aise2013_Kerrstin klemishc c-aise2013_
Kerrstin klemishc c-aise2013_
 
Ignacio panach ormeño et-al_caise2013
Ignacio panach   ormeño et-al_caise2013Ignacio panach   ormeño et-al_caise2013
Ignacio panach ormeño et-al_caise2013
 
Peter sawyer caise
Peter sawyer  caisePeter sawyer  caise
Peter sawyer caise
 
Scekic caise13-
Scekic caise13-Scekic caise13-
Scekic caise13-
 
Moe wynn caise13 presentation
Moe wynn   caise13 presentationMoe wynn   caise13 presentation
Moe wynn caise13 presentation
 
Jian yu caise13-
Jian yu caise13-Jian yu caise13-
Jian yu caise13-
 
Tommi kramer 2013-06-21-caise-re2-kramer
Tommi kramer   2013-06-21-caise-re2-kramerTommi kramer   2013-06-21-caise-re2-kramer
Tommi kramer 2013-06-21-caise-re2-kramer
 
Canovas cabot topublish-caise2013-
Canovas cabot topublish-caise2013-Canovas cabot topublish-caise2013-
Canovas cabot topublish-caise2013-
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Henning agt talk-caise-semnet

  • 1. 28.06.2013 DIMA – TU Berlin 1 Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin http://www.dima.tu-berlin.de/ Automated Construction of a Large Semantic Network of Related Terms for Domain-Specific Modeling CAiSE 2013, June 21st, Valencia Henning Agt and Ralf-Detlef Kutsche Technische Universität Berlin
  • 2. 28.06.2013 DIMA – TU Berlin 2 ■ Autocompletion applications ■ Predict what the user wants to model next Motivation nurse treatment medicine emergency ...
  • 3. 28.06.2013 DIMA – TU Berlin 3 ■ Our Vision: Provide automated suggestions of semantically related model elements for domain modeling [5],[19] □ Focus on domain terminology and conceptual design □ Query domain and common sense ontologies □ Information extraction from text ■ Requirements for the intended application □ Dictionary of terms □ Relations between terms □ Query interface and ranking functions Research Goals nurse treatment medicine emergency ... OntoOntoOnto‐ logies Extract Modeling Tools Knowledge Service Query Text Analysis OntoOntoTermi‐ nology Retrieve/ Integrate Generate Provide Suggestions Use
  • 4. 28.06.2013 DIMA – TU Berlin 4 ■ Input dataset ■ Text analysis process ■ Application of SemNet ■ Evaluation of SemNet ■ Conclusions and Future Work Agenda N‐Gram Statistics Text  Corpus N‐Gram  DB POS DB Norm. N‐Gram  DB Analyse Parse Normalize Tag SemNet Analyse Co‐occurrences Applications Retrieve Query
  • 5. 28.06.2013 DIMA – TU Berlin 5 ■ Input dataset ■ Text analysis process ■ Application of SemNet ■ Evaluation of SemNet ■ Conclusions and Future Work Agenda N‐Gram Statistics Text  Corpus N‐Gram  DB POS DB Norm. N‐Gram  DB Analyse Parse Normalize Tag SemNet Analyse Co‐occurrences Applications Retrieve Query
  • 6. 28.06.2013 DIMA – TU Berlin 6 ■ Large amounts of text data ■ N-Grams □ Sequence of n consecutive words/tokens and its frequency □ Google provides 1,2,3,4 and 5-grams in several languages ■ We work on the English-All dataset V2 (1-grams and 5-grams) [11] Google Books N-Gram Dataset 5 million books Corpus 500 billion words N‐gram analysis N‐Gram Dataset CSV text files with word frequencies ... … to go to the hospital 46,410 general condition of the patient 28,198 I was in the hospital 19,268 discharge from the hospital . 12,476 admission to the hospital . 10,558 the patient to the hospital 6,422 by placing the patient in 6,026 between doctor and patient . 5,908 ... ... … able to leave the hospital 4,629 patient admitted to the hospital 4,303 a patient in the hospital 3,844 the symptom of the patient 2,559 the patient under local anesthesia 2,536 a patient is suffering from 2,475 the doctor and the hospital 1,362 the hospital and the doctor 1,017 ...
  • 7. 28.06.2013 DIMA – TU Berlin 7 ■ Input dataset ■ Text analysis process ■ Application of SemNet ■ Evaluation of SemNet ■ Conclusions and Future Work Agenda N‐Gram Statistics Text  Corpus N‐Gram  DB POS DB Norm. N‐Gram  DB Analyse Parse Normalize Tag SemNet Analyse Co‐occurrences Applications Retrieve Query
  • 8. 28.06.2013 DIMA – TU Berlin 8 ■ N-gram database  Make the data manageable □ Input: 2.5 terabytes of text □ Output: Tables with 10 million 1-grams and 710 million 5-grams (21 gigabytes) ■ Part-of-speech tagging [8], [9]  Identify lexical category of each text token □ Output: Table with POS tags for each 5-gram (14 gigabytes) ■ Normalization  Reduce amount of word variations □ Plural stemming, lowercasing of adjectives and normal nouns □ Proper nouns are not touched ■ Result: 710 million normalized and tagged 5-grams Preprocessing JJ    NN  IN  DT   NN general condition of the patient NN   NN NN CC   NN drug store pharmacist or doctor doctors  doctor Medical practitioner  medical practitioner hospitals in Valencia  hospital in Valencia Adjective Normal Noun DeterminerPreposition CoordinatingCoordinating conjunction
  • 9. 28.06.2013 DIMA – TU Berlin 9 ■ Input dataset ■ Text analysis process ■ Application of SemNet ■ Evaluation of SemNet ■ Conclusions and Future Work Agenda N‐Gram Statistics Text  Corpus N‐Gram  DB POS DB Norm. N‐Gram  DB Analyse Parse Normalize Tag SemNet Analyse Co‐occurrences Applications Retrieve Query
  • 10. 28.06.2013 DIMA – TU Berlin 10 ■ Goal: Detect domain terminology using syntactical patterns [12] ■ Analysis of existing dictionaries □ 75% of terms: noun, noun-noun, adjective noun combinations ■ Excerpt of the 20 patterns used: ■ No proper nouns: Stanford University / university professor □ Our focus is conceptual design on schema level ■ Limitation: 5-gram: 5 words □ Maximum length of a term: 3 words Lexical Patterns doctor or mental health professional term termseparation
  • 11. 28.06.2013 DIMA – TU Berlin 11 ■ Hierarchical pattern matching ■ Distributional Semantics [13], [22] □ “Words that occur in the same contexts tend to have similar meanings.” (Distributional Hypothesis by Z. Harris) Co-Occurring Terms your doctor or pharmacist .      9271 Context frequency Absolute  frequency „doctor“ and „pharmacist“ co‐occurred 9271 times Highest level remains No idiomatic phrasesNo consecutive patterns Easiest case
  • 12. 28.06.2013 DIMA – TU Berlin 12 ■ Discard 5-grams that contain 4 or 5 stopwords ■ Apply pattern matching on the remaining 5-grams  Result: Large table of binary relations ■ Frequency aggregation □ Many terms co-occurred in different contexts ■ Relative frequency computation □ For each term with respect to its related terms ■ Graph construction □ Directed, weighted edges □ Relational database and graph database serialization (SQLite / Neo4J) SemNet Construction to go to the doctor I am what I am a ) ( 2 )
  • 13. 28.06.2013 DIMA – TU Berlin 13 ■ Properties of SemNet □ 268,937 distinct single-word terms □ 2,115,494 distinct double-word terms □ 355,689 distinct triple-word terms □  2.7 million terms and 37.5 million relations □ 2.2 GB disc space ■ Lessons learned from the analysis process Statistics 41,6% 15,7% 32,6% 10,1% 4 or 5 stopwords N-Gram Information Content Only 1 term No pattern match N-grams with a semantic relationship Semantic relatedness: Zipf‘s law Rank Degreeofrelatedness
  • 14. 28.06.2013 DIMA – TU Berlin 14 ■ Input dataset ■ Text analysis process ■ Application of SemNet ■ Evaluation of SemNet ■ Conclusions and Future Work Agenda N‐Gram Statistics Text  Corpus N‐Gram  DB POS DB Norm. N‐Gram  DB Analyse Parse Normalize Tag SemNet Analyse Co‐occurrences Applications Retrieve Query
  • 15. 28.06.2013 DIMA – TU Berlin 15 ■ Query Interfaces □ SQL: Query the relational database □ Cypher: Query the Neo4J database □ Java: Use SemNet in your applications □ PHP: Explore the data in a web interface ■ Examples of top 10 automatically identified related terms Querying SemNet (f – absolute term frequency in the original text corpus, #r – number of related terms) select * from nouncooccurrences where termw1 =  5824331 and termw2 is null and termw3 is null order by relfreq desc limit 20; public ArrayList<String> getRelatedStringTerms(ArrayList<String> inputTerms) { … }
  • 16. 28.06.2013 DIMA – TU Berlin 16 ■ Challenge: Methods based matrices and vectors are too slow ■ Strategy: Related term sets intersection + relative frequency multiplication Ranking Results of Multiple Input Terms chair 0.0441 contents 0.0359 end 0.0221 front 0.0194 figure 0.0189 head 0.0189 side 0.0180 data 0.0157 hand 0.0132 column 0.0131 page 0.0118 edge 0.0112 result 0.0100 value 0.0099 place 0.0087 row 0.0086 show 0.0082 elbow 0.0072 list 0.0071 bed 0.0071 table transaction data 0.0735 information 0.0569 record 0.0376 table 0.0334 access 0.0310 spreadsheet 0.0252 name 0.0201 object 0.0164 retrieval system 0.0163 file 0.0158 example 0.0153 use 0.0150 connection 0.0146 structure 0.0139 field 0.0125 user 0.0124 change 0.0112 type 0.0107 size 0.0104 transaction 0.0102 database … … data 0.001155 contents 0.000359 information 0.000190 record 0.000091 use 0.000077 end 0.000060 example 0.000055 name 0.000050 figure 0.000047 value 0.000045 result 0.000037 list 0.000037 column 0.000034 row 0.000033 object 0.000024 field 0.000023 book 0.000016 order 0.000016 size 0.000014 query 0.000012 table+database … ∩ *
  • 17. 28.06.2013 DIMA – TU Berlin 17 ■ Prototype: Ecore Diagram Editor with class name suggestions [15] ■ Automated suggestion adaption with respect to the content of the model Modeling With Semantic Autocompletion
  • 18. 28.06.2013 DIMA – TU Berlin 18 ■ Input dataset ■ Text analysis process ■ Application of SemNet ■ Evaluation of SemNet ■ Conclusions and Future Work Agenda N‐Gram Statistics Text  Corpus N‐Gram  DB POS DB Norm. N‐Gram  DB Analyse Parse Normalize Tag SemNet Analyse Co‐occurrences Applications Retrieve Query
  • 19. 28.06.2013 DIMA – TU Berlin 19 ■ Challenge □ No gold standard available for many information extraction tasks ■ Our strategy: Compare SemNet to existing knowledge bases □ Provide measurements on how much information of WordNet and ConceptNet is contained in SemNet ■ WordNet V3.0: Lexical database for the English language [16] □ Synsets: Grouped terms that share the same sense □ Relations: Mainly taxonomic, part-whole and synonyms ■ ConceptNet V5.1: Semantic graph for general human knowledge [17] □ Nodes: Any natural language phrase that expresses a concept □ Relations: Taxonomic, part-whole, related-to and several others ■ SemNet: Semantic Network of Related Terms □ Nodes: Noun terminology □ Relations: Probabilistic links Evaluation Setup maternity morning sickness physical condition ectopic pregnancy entopic pregnancy synonym part meronym parturiency hyponym hypernym pregnancy Conceptually RelatedTo pregnancy expect morning sickness physical condition go to bed ectopic pregnancy PartOf stretch IsAIsA Related To Causes start family HasSubevent mother termination birth woman trimester stage weekchildbirth lactation month1 2 3 4 5 6 7 89 10 0.036 0.031 0.030 0.030 0.026 0.025 0.020 0.018 0.017 0.016 pregnancy Word sense pregnancy in WordNet (7 out of 32 relations) Concept pregnancy in ConceptNet (7 out of 58 relations). Term pregnancy in SemNet (First 10 out of 4039 relations). S W C
  • 20. 28.06.2013 DIMA – TU Berlin 20 ■ WordNet □ Iterate through all noun synsets (72,994 synsets evaluated) □ Check whether the nouns are contained in SemNet (98,681 nouns evaluated) Results: 77,16% of WordNet‘s synsets are contained in SemNet and 62,17% of WordNet‘s nouns are contained in SemNet ■ ConceptNet □ Problem: Concepts can be expressed using any natural language phrase □ First determine noun terminology □ Check whether the nouns are contained in SemNet (49,301 concepts evaluated)  Result: 82,40% of ConceptNet‘s nouns are contained in SemNet Noun terminology coverage (doctor, doc, physician, MD, Dr., medico) (ear doctor, ear specialist, otologist) (sleep talking, somniloquy, somniloquism) doctor go to bed  pregnancy beautiful
  • 21. 28.06.2013 DIMA – TU Berlin 21 ■ WordNet / ConceptNet □ Iterate through all previously found noun synsets (56,321 synsets used) and concepts (40,625 concepts used) □ Check whether the relations between synsets are contained in SemNet (61,931 WordNet relations evaluated and 256,213 ConceptNet relations evaluated) ■ Relation evaluation results Relation coverage (doctor, doc, physician, MD, Dr., medico) (medical practitioner, medical man) hypernym (surgeon)(allergist) hyponym
  • 22. 28.06.2013 DIMA – TU Berlin 22 ■ Input dataset ■ Text analysis process ■ Application of SemNet ■ Evaluation of SemNet ■ Conclusions and Future Work Agenda N‐Gram Statistics Text  Corpus N‐Gram  DB POS DB Norm. N‐Gram  DB Analyse Parse Normalize Tag SemNet Analyse Co‐occurrences Applications Retrieve Query
  • 23. 28.06.2013 DIMA – TU Berlin 23 ■ Summary □ Input: 710 million 5-grams and 20 part-of-speech patterns □ Hierarchical pattern matching, distributional semantics □ Output: 2.7M multi-word terms and 37.5M weighted relations □ Only a window of 5 words can be analyzed to detect relations □ Applications: Domain-specific modeling, keyword expansion, background knowledge for NLP tasks ■ Current and future work □ Support additional languages □ Improve ranking functions (pointwise mutual information) □ Relax 3-word-limitation, derive own n-gram datasets □ Combine probabilistic information with specific relations □ Domain clustering in the semantic network □ Additional modeling support: relations/associations, attributes Conclusions and Future Work
  • 24. 28.06.2013 DIMA – TU Berlin 24 [5] H. Agt: Supporting Software Language Engineering by Automated Domain Knowledge Acquisition. In: MODELS 2011 Workshops LNCS 7167 Springer 2012 [8] Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of the NAACL 2003, pp. 173–180. [9] Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993) [11] Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team, T.G.B., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014), 176–182 (2011) [12] Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, COLING 1992, vol. 2 (1992) [13] Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954) [15] Agt, H.: SemAcom: A System for Modeling with Semantic Autocompletion. In: Model Driven Engineering Languages and Systems - 15th International Conference, MODELS 2012, Demo Track, Innsbruck, Austria (2012) [16] Fellbaum, C.: WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998) [17] Speer, R., Havasi, C.: Representing General Relational Knowledge in ConceptNet 5. In: LREC 2012 [19] Agt, H., Kutsche, R.D., Wegeler, T.: Guidance for Domain Specific Modeling in Small and Medium Enterprises. In: SPLASH 2011 Workshops. DSM 2011, Portland, OR, USA (2011) [22] Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Int. Res. 37(1), 141–188 (2010) Thank You For Your Attention! MODELS? Try out SemNet: http://www.bizware.tu‐berlin.de/semnet/ Contact: henning.agt@tu‐berlin.de