SlideShare a Scribd company logo
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
1
Carolin Odebrecht &
Florian Zipser
Humboldt-Universität zu Berlin
ANNIS workshop
2014-08-26
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
2
A brief introduction
● Search and Visualization in Multilayer Linguistic
Corpora
– Imports existing corpora
● Corpora already have to be annotated, ANNIS only uses
what's there
● No NLP!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
3
A brief introduction
● Search and Visualization in Multilayer Linguistic
Corpora
– Makes corpora searchable
● One query language for all corpora (AQL)
● Abstraction over linguistic data necessary
● But: Corpora have different annotations → query has to
match the annotations
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
4
A brief introduction
● Search and Visualization in Multilayer Linguistic
Corpora
– Displays corpora
● Many visualizations available
● Corresponding to type of annotation (syntactic trees,
phrase trees (RST), grids, coreferences ...)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
5
A brief introduction
● What ANNIS cannot do
– Does not know how to speak natural language
→ so you have to learn AQL
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
6
A brief introduction
● What ANNIS cannot do
– Does not know how to speak natural language
→ so you have to learn AQL
– ANNIS does not know any semantics
→ „NN“, „NP“, „sentence“, „word“, „my favorite
annotation“ … are just sequences of characters
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
7
A brief introduction
● What ANNIS cannot do
– Does not know how to speak natural language
→ so you have to learn AQL
– ANNIS does not know any semantics
→ „NN“, „NP“, „sentence“, „word“, „my favorite
annotation“ … are just sequences of characters
– You need to be exact
→ e.g. „POS“ != „pos“ and „NN“ != „NN “ (regard the
blank)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
8
ANNIS basics
ANNIS basics
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
9
Enter query
Corpus list
Previous
queries
Virtual
Keyboard
(e.g. arabic)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
10
Sample queries
(corresponding
to corpus)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
11
Query result
Visualizations
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
12
Corpus
metadata
Corpus
metadata
window
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
13
Document
metadata
Document
metadata
window
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
14
ANNIS basics
● Basic principles of AQL (ANNIS Query
Language)
– Attributes and values
● Searching for exact character sequences
● Searching for patterns
– Combinatory search
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
15
Demo corpus
● Corpus for demonstration: pcc2 (a sub corpus
of pcc)
https://korpling.german.hu-berlin.de/annis3/#_c=cGNjMg
● Potsdam Commentary Corpus
– German Newspaper commentaries
'Märkische Allgemeine Zeitung'
https://www.ling.uni-potsdam.de/acl-lab/Forsch/pcc/pcc.html
– Multiple annotations
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
16
ANNIS basics
● Different types of annotations
– Token annotation
– Span annotation
– Pointing relation
– Hierarchy annotation
(trees)
To ke n To ke n To ke n To ke n To ke n To ke n
Sp a n Sp a n
Sp a n
N o d e
Ed ge
K e y
K e y
K e y
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
17
ANNIS basics
● Different types of annotations
– Token annotation
– Span annotation
– Pointing relation
– Hierarchy annotation
(trees)
To ke n To ke n To ke n To ke n To ke n To ke n
Sp a n Sp a n
Sp a n
N o d e
Ed ge
K e y
K e y
K e y To ke n To ke n To ke n To ke n To ke n To ke n
Sp a n Sp a n
Sp a n
N o d e
Ed ge
K e y
K e y
K e y
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
18
Exact word forms
● Token annotation
– Exact sequence
searching for a word form
"Jugendlichen"
"jugendlichen"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
19
Exact word forms
● Token annotation
– Exact sequence
searching for a word form
"Jugendlichen" 3 hits
"jugendlichen" 0 hits
→ tok="jugendlichen"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
20
Exact token
annotation
● Token annotation
– Exact sequence
searching for an exact part of speech tag
pos = "NN"
attribute value
– Attributes can have more than one value
– Searching for all values of an attribute
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
21
Exact token
annotation
● Token annotation
– Exact sequence
searching for an exact part of speech tag
pos="NN"
pos="ADJA"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
22
Exact token
annotation
● Token annotation
– Exact sequence
searching for an exact part of speech tag
pos="NN" 62 hits
pos="ADJA" 18 hits
searching for all values of an attribute
pos 399 hits
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
23
Exact span
annotation
● Span annotation
– Exact sequence
searching for sentences
Sent="s"
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
24
Exact span
annotation
● Span annotation
– Exact sequence
searching for sentences
Sent="s" 28 hits
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
25
Metadata
● Sent="s" 28 hits
– necessary to know which annotations are in a
corpus
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
26
Pattern
● Token annotation
– Patterns
. matches any single character
* zero or more of the preceding element
searching for the beginning a of word
/Jugend.*/
/jugend.*/
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
27
Pattern
● Token annotation
– Patterns
. matches any single character
* zero or more of the preceding element
searching for the beginning a of word
/Jugend.*/ 5 hits ("Jugendlichen" 3 hits)
Jugendlichen Jugendliche
/jugend.*/ 0 hits ("jugendlichen" 0 hits)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
28
Pattern
● Token annotation
– patterns
searching for all nouns
pos=/N./ includes NN & NE
searching for all adjectives
pos=/ADJ./ includes ADJA & ADJD
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
29
Pattern
● Token annotation
– patterns
searching for all nouns
pos=/N./ 73 hits (pos="NN" 62 hits)
searching for all adjectives
pos=/ADJ./ 32 hits (pos="ADJA" 18 hits)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
30
Relations between
annotations
● Span annotation
searching for all NPs
cat="NP" 41 hits (pos="NN" 62 hits)
e.g. Die Jugendlichen in Zossen
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
31
Relations between
annotations
● Relations between attributes
searching for all NPs which contain a preposition
cat="NP" 41 hits
pos="APPR" 19 hits
e.g. Die Jugendlichen in Zossen
→ no relation between the two information!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
32
Relations between
annotations
● Relations between attributes
searching for all NPs which contain a preposition
cat="NP" #1
pos="APPR" #2
e.g. Die Jugendlichen in Zossen
→ NP includes APPR
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
33
Relations between
annotations
● Relations between attributes
searching for all NPs which contain a preposition
cat="NP" &
pos="APPR" &
#1_i_#2
e.g. Die Jugendlichen in Zossen
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
34
Hierarchy relations
● Relations between attributes
searching for all NPs which are objects
cat="NP"
e.g. Die Jugendlichen in Zossen -->subject!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
35
Hierarchy relations
● Relations between attributes
searching all NPs which are objects
– NP → node annotation
– OA → edge annotation
To ke n To ke n To ke n
Sp a n
N o d e
Ed ge
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
36
Hierarchy relations
● Relations between attributes
searching all NPs which are objects
cat="NP"
the syntactic function in the tree
func="OA"
→ Note: At least there are two elements which
relate in a way to each other!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
37
Hierarchy relations
● Relations between attributes
searching all NPs which are objects
node & cat="NP" & #1 >[func="OA"] #2
e.g. ein Musikcafé -->object!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
38
Used Relations
● Relations we used:
A _i_ B A includes B
A > B A dominates B
A >[func=“OA“] B A dominates B and B is an
object
The full list of relations can be found in ANNIS
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
39
What's new in
ANNIS
What's new in ANNIS
version 3.1.7
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
40
What's new in
ANNIS
●
Simplified syntax (AQL)
●
Frequency analysis (Visualisierung)
●
Expand match context (Visualisierung)
●
Equality and Inequality (AQL)
●
Variables (AQL)
●
Complex OR expression (AQL)
●
Document browser (Visualisierung)
●
CSV export (Visualisierung)
●
Tooltip for corpus names (Visualisierung)
●
Report problem (Visualisierung)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
41
Simplified syntax
●
Question:
„Die“ followed by „Jugendlichen“ both being dominated
by a prepositional phrase which is dominated by a
sentence
So far:
cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 >
#4 & #3 . #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
42
Simplified syntax
●
Question:
„Die“ followed by „Jugendlichen“ both being dominated
by a prepositional phrase which is dominated by a
sentence
So far:
cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 >
#4 & #3 . #4
Simplified:
cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
43
Frequency analysis
●
Question:
– How many words tagged as „NN“, „ADJA“ or „ADV“
does a corpus contain?
– What are the most frequent part-of-speech tags
followed by a noun?
– What are the most frequent part-of-speech tags in a
prepositional phrase, which is in a sentence?
– ...
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
44
Frequency analysis
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
45
Frequency analysis
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
46
Frequency analysis
Attention:
A frequency analysis has to be bound to a query!
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
47
Frequency analysis
● What are the most
frequent part-of-speech
tags followed by a noun?
● What are the most frequent
part-of-speech tags in a
prepositional phrase,
which is in a sentence?
pos . pos="NN"
cat="S" > cat="PP" > pos
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
48
Expand match
context
● Even more than 25 is possible, it's a free text
field
● Sometimes the context is too small
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
49
Equality and
Inequality
●
Equality „==“ and inequality „!=“ for attributes
Question (inequality):
two different part-of-speech tags, one directly following
the other
pos . pos & #1 != #2
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
50
Equality and
Inequality
● Equality „==“ and inequality „!=“ for attributes
● Question (equality):
two same part-of-speech tags, one directly following the
other
pos . pos & #1 == #2
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
51
Equality and
Inequality
● Equality „==“ and inequality „!=“ for attributes
Question (inequality):
two different part-of-speech tags, one directly following the
other
pos . pos & #1 != #2
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
52
Variables
● Question:
„Die“ followed by „Jugendlichen“ both being dominated by
a prepositional phrase which is dominated by a sentence
Simplified:
cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
53
Variables
● Question:
„Die“ followed by „Jugendlichen“ both being dominated by
a prepositional phrase which is dominated by a sentence
Simplified:
cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
54
Variables
● Question:
„Die“ followed by „Jugendlichen“ both being dominated by
a prepositional phrase which is dominated by a sentence
Simplified:
cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
Variables and numbers can be mixed:
cat="S" > np#cat="NP" > "Die" . "Jugendlichen" & #np > #4
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
55
Complex OR
expression
● Question (simple OR):
A part-of-speech tag which is a noun, an attributive
adjective or an article
pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
56
Complex OR
expression
pos="NN" | pos="ADJA" | pos= "ART"
● Question (simple OR):
A part-of-speech tag which is a noun, an attributive
adjective or an article
● OR for expressions
pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
57
Complex OR
expression
(cat="S" > cat="PP") | cat="NP"
● Question (complex OR):
A prepositional phrase, which is dominated by a sentence,
or just a nominal phrase
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
58
Complex OR
expression
a#cat="PP" &
(b#pos="NN" | b#pos="ADJA" | b#pos= "ART") &
#a > #b
● Question (nested OR):
A prepositional phrase, which dominates a noun, an
attributive adjective or an article
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
59
Complex OR
expression
a#cat="PP" &
(b#pos="NN" | b#pos="ADJA" | b#pos= "ART") &
#a > #b
● Question (nested OR):
A prepositional phrase, which dominates a noun, an
attributive adjective or an article
Attention:
All expressions in brackets have to use the same variable
… & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & ...
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
60
Document browser
● Displays the entire text of a document
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
61
Document browser
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
62
CSV export
● Export data for futher processing
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
63
Tooltips for corpus
names
● Sometimes corpus names can get very long
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
64
Report problem
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
65
Get ANNIS
● ANNIS comes in two flavors
– A server version
– A desktop version (ANNIS kickstarter)
– Both are downloadable at:
http://www.sfb632.uni-potsdam.de/annis/
● ANNIS is open source (Apache license 2.0) and
hosted on github
– https://github.com/korpling/ANNIS
ANNIS workshopCarolin Odebrecht & Florian Zipser
ANNIS: Search and
Visualization in
Multilayer Linguistic
Corpora
66
Thanks for your attention!
Any questions?
carolin.odebrecht@hu-berlin.de,
f.zipser@gmx.de

More Related Content

Recently uploaded

Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
alexjohnson7307
 
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
SelfMade bd
 
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
FIDO Alliance
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
Google Developer Group - Harare
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
Enterprise Knowledge
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
DianaGray10
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
isBullShit
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
AimanAthambawa1
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 

Recently uploaded (20)

Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
 
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdfLeadMagnet IQ Review:  Unlock the Secret to Effortless Traffic and Leads.pdf
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdf
 
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
UX Webinar Series: Drive Revenue and Decrease Costs with Passkeys for Consume...
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
Google I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged SlidesGoogle I/O Extended Harare Merged Slides
Google I/O Extended Harare Merged Slides
 
Improving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning ContentImproving Learning Content Efficiency with Reusable Learning Content
Improving Learning Content Efficiency with Reusable Learning Content
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1Discovery Series - Zero to Hero - Task Mining Session 1
Discovery Series - Zero to Hero - Task Mining Session 1
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
Girls call Kolkata 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
Accelerating Migrations = Recommendations
Accelerating Migrations = RecommendationsAccelerating Migrations = Recommendations
Accelerating Migrations = Recommendations
 
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
COVID-19 and the Level of Cloud Computing Adoption: A Study of Sri Lankan Inf...
 
Redefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI CapabilitiesRedefining Cybersecurity with AI Capabilities
Redefining Cybersecurity with AI Capabilities
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 

Featured

2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing
Search Engine Journal
 
Storytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design ProcessStorytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
 
2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 

Featured (20)

2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing
 
Storytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design ProcessStorytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design Process
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
 
2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 

ANNIS workshop sfb 2014

  • 1. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 1 Carolin Odebrecht & Florian Zipser Humboldt-Universität zu Berlin ANNIS workshop 2014-08-26
  • 2. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 2 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Imports existing corpora ● Corpora already have to be annotated, ANNIS only uses what's there ● No NLP!
  • 3. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 3 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Makes corpora searchable ● One query language for all corpora (AQL) ● Abstraction over linguistic data necessary ● But: Corpora have different annotations → query has to match the annotations
  • 4. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 4 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Displays corpora ● Many visualizations available ● Corresponding to type of annotation (syntactic trees, phrase trees (RST), grids, coreferences ...)
  • 5. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 5 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL
  • 6. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 6 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters
  • 7. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 7 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters – You need to be exact → e.g. „POS“ != „pos“ and „NN“ != „NN “ (regard the blank)
  • 8. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 8 ANNIS basics ANNIS basics
  • 9. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 9 Enter query Corpus list Previous queries Virtual Keyboard (e.g. arabic)
  • 10. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 10 Sample queries (corresponding to corpus)
  • 11. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 11 Query result Visualizations
  • 12. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 12 Corpus metadata Corpus metadata window
  • 13. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 13 Document metadata Document metadata window
  • 14. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 14 ANNIS basics ● Basic principles of AQL (ANNIS Query Language) – Attributes and values ● Searching for exact character sequences ● Searching for patterns – Combinatory search
  • 15. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 15 Demo corpus ● Corpus for demonstration: pcc2 (a sub corpus of pcc) https://korpling.german.hu-berlin.de/annis3/#_c=cGNjMg ● Potsdam Commentary Corpus – German Newspaper commentaries 'Märkische Allgemeine Zeitung' https://www.ling.uni-potsdam.de/acl-lab/Forsch/pcc/pcc.html – Multiple annotations
  • 16. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 16 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  • 17. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 17 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  • 18. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 18 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" "jugendlichen"
  • 19. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 19 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" 3 hits "jugendlichen" 0 hits → tok="jugendlichen"
  • 20. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 20 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos = "NN" attribute value – Attributes can have more than one value – Searching for all values of an attribute
  • 21. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 21 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" pos="ADJA"
  • 22. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 22 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" 62 hits pos="ADJA" 18 hits searching for all values of an attribute pos 399 hits
  • 23. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 23 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s"
  • 24. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 24 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s" 28 hits
  • 25. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 25 Metadata ● Sent="s" 28 hits – necessary to know which annotations are in a corpus
  • 26. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 26 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ /jugend.*/
  • 27. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 27 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ 5 hits ("Jugendlichen" 3 hits) Jugendlichen Jugendliche /jugend.*/ 0 hits ("jugendlichen" 0 hits)
  • 28. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 28 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ includes NN & NE searching for all adjectives pos=/ADJ./ includes ADJA & ADJD
  • 29. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 29 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ 73 hits (pos="NN" 62 hits) searching for all adjectives pos=/ADJ./ 32 hits (pos="ADJA" 18 hits)
  • 30. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 30 Relations between annotations ● Span annotation searching for all NPs cat="NP" 41 hits (pos="NN" 62 hits) e.g. Die Jugendlichen in Zossen
  • 31. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 31 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" 41 hits pos="APPR" 19 hits e.g. Die Jugendlichen in Zossen → no relation between the two information!
  • 32. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 32 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" #1 pos="APPR" #2 e.g. Die Jugendlichen in Zossen → NP includes APPR
  • 33. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 33 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" & pos="APPR" & #1_i_#2 e.g. Die Jugendlichen in Zossen
  • 34. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 34 Hierarchy relations ● Relations between attributes searching for all NPs which are objects cat="NP" e.g. Die Jugendlichen in Zossen -->subject!
  • 35. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 35 Hierarchy relations ● Relations between attributes searching all NPs which are objects – NP → node annotation – OA → edge annotation To ke n To ke n To ke n Sp a n N o d e Ed ge
  • 36. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 36 Hierarchy relations ● Relations between attributes searching all NPs which are objects cat="NP" the syntactic function in the tree func="OA" → Note: At least there are two elements which relate in a way to each other!
  • 37. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 37 Hierarchy relations ● Relations between attributes searching all NPs which are objects node & cat="NP" & #1 >[func="OA"] #2 e.g. ein Musikcafé -->object!
  • 38. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 38 Used Relations ● Relations we used: A _i_ B A includes B A > B A dominates B A >[func=“OA“] B A dominates B and B is an object The full list of relations can be found in ANNIS
  • 39. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 39 What's new in ANNIS What's new in ANNIS version 3.1.7
  • 40. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 40 What's new in ANNIS ● Simplified syntax (AQL) ● Frequency analysis (Visualisierung) ● Expand match context (Visualisierung) ● Equality and Inequality (AQL) ● Variables (AQL) ● Complex OR expression (AQL) ● Document browser (Visualisierung) ● CSV export (Visualisierung) ● Tooltip for corpus names (Visualisierung) ● Report problem (Visualisierung)
  • 41. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 41 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4
  • 42. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 42 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4 Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  • 43. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 43 Frequency analysis ● Question: – How many words tagged as „NN“, „ADJA“ or „ADV“ does a corpus contain? – What are the most frequent part-of-speech tags followed by a noun? – What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? – ...
  • 44. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 44 Frequency analysis
  • 45. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 45 Frequency analysis
  • 46. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 46 Frequency analysis Attention: A frequency analysis has to be bound to a query!
  • 47. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 47 Frequency analysis ● What are the most frequent part-of-speech tags followed by a noun? ● What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? pos . pos="NN" cat="S" > cat="PP" > pos
  • 48. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 48 Expand match context ● Even more than 25 is possible, it's a free text field ● Sometimes the context is too small
  • 49. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 49 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  • 50. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 50 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes ● Question (equality): two same part-of-speech tags, one directly following the other pos . pos & #1 == #2
  • 51. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 51 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  • 52. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 52 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  • 53. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 53 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
  • 54. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 54 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug Variables and numbers can be mixed: cat="S" > np#cat="NP" > "Die" . "Jugendlichen" & #np > #4
  • 55. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 55 Complex OR expression ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  • 56. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 56 Complex OR expression pos="NN" | pos="ADJA" | pos= "ART" ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article ● OR for expressions pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  • 57. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 57 Complex OR expression (cat="S" > cat="PP") | cat="NP" ● Question (complex OR): A prepositional phrase, which is dominated by a sentence, or just a nominal phrase
  • 58. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 58 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article
  • 59. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 59 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article Attention: All expressions in brackets have to use the same variable … & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & ...
  • 60. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 60 Document browser ● Displays the entire text of a document
  • 61. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 61 Document browser
  • 62. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 62 CSV export ● Export data for futher processing
  • 63. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 63 Tooltips for corpus names ● Sometimes corpus names can get very long
  • 64. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 64 Report problem
  • 65. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 65 Get ANNIS ● ANNIS comes in two flavors – A server version – A desktop version (ANNIS kickstarter) – Both are downloadable at: http://www.sfb632.uni-potsdam.de/annis/ ● ANNIS is open source (Apache license 2.0) and hosted on github – https://github.com/korpling/ANNIS
  • 66. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 66 Thanks for your attention! Any questions? carolin.odebrecht@hu-berlin.de, f.zipser@gmx.de