2. What is LAF?
Linguistic Annotation Framework
ISO standard 24612:2012
Nancy Ide, Laurent Romary
A data model for stand-off markup
plus a serialization advice: GrAF
4. OANC
17K 8 feb 2013 ch5-logical.xml
217K 8 feb 2013 ch5-mpqa.xml
265K 8 feb 2013 ch5-nc.xml
16K 8 feb 2013 ch5-ne.xml
1,3M 8 feb 2013 ch5-penn.xml
1,4M 8 feb 2013 ch5-ptb.xml
990K 8 feb 2013 ch5-ptbtok.xml
48K 8 feb 2013 ch5-s.xml
274K 8 feb 2013 ch5-seg.xml
177K 8 feb 2013 ch5-vc.xml
2,6K 3 jun 09:05 ch5.hdr
31K 8 feb 2013 ch5.txt
19K 8 feb 2013 resource-header.xml
5. text
Semantics enters with purpose.
10
1-21
210
For this to be true, it is not necessary that the carriers of purpose,
say, the same bacterium heading upstream in the glucose gradient, be
conscious.
I hope my definition of an autonomous agent is useful, an
autocatalytic system carrying out a work cycle, now rather broadened by the
realization that autonomous agents also do often detect and measure and
record displacements of external systems from equilibrium that can be used to
extract work, then do extract work, propagating work and constraint
construction, from their environment.
12. Performance
Problems
POIO:
load time +60 min
RAM +20 GB
ExistDb:
load time +30 min
(initial)
count features +60
min
nodes/edges/features directly
modeled as objects in Python
!
!
xquery not a handy tool for
relevant queries
need extensive index building
identifier chasing
14. What is it?
a compiler
LAF-XML ==> Python arrays
2,270 MB ==> 485 MB binary data
60 min load time ==> 1 s
a task execution environment
runs custom Python scripts
offering them a LAF API
15. Where is it?
clone it from Github
https:/
/github.com/dirkroorda/laffabric
run it locally
share your custom tasks
share your own annotations
16. Example: Esther
linguistic variation among the bible
books
count the common nouns of Esther
compare their freqs in Esther with
those in other books of the Bible
19. Given (2)
Information
about the
books (where
they start
and end)
<region xml:id="s_1254379"
anchors="4609273 4664382"/>
<node xml:id="n1254379"><link
targets="s_1254379"/></node>
<a xml:id="as1254379" label="db"
ref="n1254379"><fs>
<f name="otype" value="book"/>
<f name="oid" value="1254379"/>
<f name="monads"
value="368500-373120"/>
<f name="minmonad" value="368500"/>
<f name="maxmonad" value="373120"/>
</fs></a>
<a xml:id="asf34" label="sft"
ref="n1254379"><fs>
<f name="book" value="Esther"/>
</fs></a>
20. a small python script
to the
e
d
workbench!
co
ce
target_book = "Esther"
ur
for node in NN():
so
this_type = F.shebanq_db_otype.v(node)
if this_type == "word":
p_o_s = F.shebanq_ft_part_of_speech.v(node)
if p_o_s == "noun":
noun_type = F.shebanq_ft_noun_type.v(node)
if noun_type == "common":
words[book_name] += 1
lexeme = F.shebanq_ft_lexeme_utf8.v(node)
lexemes[book_name][lexeme] += 1
elif this_type == "book":
book_name = F.shebanq_sft_book.v(node)
books.append(book_name)
ontarget = F.shebanq_sft_book.v(node) == target_book
21. Declare features
"features": {
"shebanq": {
"node": [
"db.otype",
"ft.part_of_speech,noun_type,lexeme_utf8",
"sft.book",
],
"edge": [
],
},
The workbench will
load selected features
unload other features
22. Receive task object
def task(graftask):
(msg, NN, F, X) =
graftask.get_mappings()
!
And use supplied methods for rapid
data access.
25. Next steps
Usage by ETCBC
workflow for adding
annotations
Wider Digital Humanities
pattern seeking
Wido van Peursen, Janet
Dyk, whoever needs new
kinds of data in and out
the database
!
Rens Bod
!
!
Incorporate in NLPLAB VU
Combine with POIO
NEO4J backend ?
Discuss at workshops
Wouter van Atteveldt
!
Peter Bouda
!
TLA Nijmegen (done)
CLIN (accepted)
DH2014?