SlideShare a Scribd company logo
Poio API: a CLARIN-D curation project for
language documentation and language typology
Peter Bouda
Centro Interdisciplinar de Documentação Linguística e Social
pbouda@cidles.eu
Overview
● Existing infrastructure and workflows
● Poio API and CLASS within CLARIN
● GrAF and TEI
● Poio API
● GrAF as pivot structures (IGT)
● GrAF for retro-digitization (Dictionary)
Fieldwork
Fotos
Existing Infrastructure
LD tools and standards
● Elan: EAF, MPEG, WAV
● Toolbox: TXT, XML, WAV
● Arbil: IMDI/CIMDI („Component MetaData
Infrastructure“)
● Praat: XML, WAV
● ...
● No standards for tier hierarchies, tier names or
annotation schemes
● Efforts in ISOcat
Interlinear Glossed Text
CLARIN
GrAF
● GrAF: Graph Annotation Framework
● ISO 24612: Language resource management - Linguistic
annotation framework (LAF)
● Started as stand-off version of XCES
● API and representation as data structures, not a file format
● GrAF/XML as XML representation
● Used for the MASC of the ANC
● Nodes, edges, regions, annotations, feature structures
GrAF entities
GrAF structure
GrAF-XML
<node xml:id="words..W-Words..na23">
<link targets="words..W-Words..ra23"/>
</node>
<region anchors="780 1340" xml:id="words..W-Words..ra23"/>
<edge from="utterance..W-Spch..n8" to="words..W-Words..na23"
xml:id="ea23"/>
<a as="words" label="words" ref="words..W-Words..na23"
xml:id="a23">
<fs>
<f name="annotation_value">so</f>
</fs>
</a>
Why we use GrAF
● No inline markup
● Radical stand-off approach
– Easier to share and manage data
– Preferred solution to archive cultural heritage
– Ideal for sparse annotations
● Existing code: Java and Python
● API vs. XQuery
● The beauty of annotation graphs
Poio API
●
Think of GrAF as an assembly language for linguistic annotation; then
Poio API is a libray to map from and to higher-level languages
● Subset of GrAF to represent tier based annotation
– Interlinear glossed text (IGT)
● Filters and filter chains for search
● Plugin mechanism for file formats
– Mapping semantics: tiers and annotations to nodes and edges
● Meta-data for additional information (tier types etc.)
● Efforts to map between TEI and GrAF
– Poio API supports IGT, next step is dictionaries and lexica
– Retro-digitized dictionary data at University of Marburg are published as GrAF files
– We want to publish as TEI
A basic converter in Poio API
parser = poioapi.io.wikipedia_extractor.Parser("Wikipedia.xml")
writer = poioapi.io.graf.Writer()
converter = poioapi.io.graf.GrAFConverter(parser, writer)
converter.parse()
converter.write("Wikipedia.hdr")
A parser for CSV files
class CsvParser(poioapi.io.graf.BaseParser):
def get_root_tiers(self):
pass
def get_child_tiers_for_tier(self, tier):
pass
def get_annotations_for_tier(self, tier, annotation_parent=None):
pass
def tier_has_regions(self, tier):
pass
def region_for_annotation(self, annotation):
pass
def get_primary_data(self):
pass
Example: Analysis of CSV data
Example: Analysis of CSV data
http://nbviewer.ipython.org/urls/raw.github.com/pbouda/notebooks/master/Diana%2520Hinuq%25
Retro-digitization of dictionaries
● From scan to .doc to XML to DB to GrAF
● Radical stand-off approach for unsupervised
collaboration
● Dictionaries as cultural heritage texts
● GrAF as primary publication format
● Connectors to brat and TEI
Analysis of the data
● Spanish as pivot language, subset of bodypart terms
● Converting GrAF to networkx graph
● Nodes are heads, translations, etc.
● Head and translation connected via edges if they appear
in one entry
● Merge of graphs
● Count of paths of length 2 between spanish heads
● Python writes JSON graph, visualized with D3.js
D3 visualization
http://www.peterbouda.eu/bodyparts/index_bodyparts.html
Thank you for your attention!
pbouda@cidles.eu
Links
Clarin curation project:
http://de.clarin.eu/en/discipline-specific-working-groups/wg-3-linguistic-fieldwork-anthr
Poio:
http://media.cidles.eu/poio/
GrAF:
http://www.xces.org/ns/GrAF/1.0/

More Related Content

What's hot

R programming
R programmingR programming
R programming
Nandhini G
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
Olaf Hartig
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
DTU - Technical University of Denmark
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
LDBC council
 
R Programming
R ProgrammingR Programming
R Programming
Abhishek Pratap Singh
 
R Introduction
R IntroductionR Introduction
R Introductionschamber
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
Sakthi Dasans
 
Introducing The R Software
Introducing The R Software  Introducing The R Software
Introducing The R Software
Kamarul Imran
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
Dimitris Kontokostas
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
R programming language: conceptual overview
R programming language: conceptual overviewR programming language: conceptual overview
R programming language: conceptual overview
Maxim Litvak
 
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
GUANGYUAN PIAO
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
Sovello Hildebrand
 
R programming
R programmingR programming
R programming
Dr. Vaibhav Kumar
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
Nimrita Koul
 
A short tutorial on r
A short tutorial on rA short tutorial on r
A short tutorial on r
Ashraf Uddin
 
Exposing relational database as rdf
Exposing relational database as rdfExposing relational database as rdf
Exposing relational database as rdf
Shakil Ahmed
 
R crash course
R crash courseR crash course
R crash course
Tomislav Hengl
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
Dr. Awase Khirni Syed
 

What's hot (20)

R programming
R programmingR programming
R programming
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
 
R Programming
R ProgrammingR Programming
R Programming
 
R Introduction
R IntroductionR Introduction
R Introduction
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Introducing The R Software
Introducing The R Software  Introducing The R Software
Introducing The R Software
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
R programming language: conceptual overview
R programming language: conceptual overviewR programming language: conceptual overview
R programming language: conceptual overview
 
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
R programming
R programmingR programming
R programming
 
Poster
PosterPoster
Poster
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
A short tutorial on r
A short tutorial on rA short tutorial on r
A short tutorial on r
 
Exposing relational database as rdf
Exposing relational database as rdfExposing relational database as rdf
Exposing relational database as rdf
 
R crash course
R crash courseR crash course
R crash course
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 

Viewers also liked

Best episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developerBest episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developer
Peter Bouda
 
Smart Pen Presentation
Smart Pen PresentationSmart Pen Presentation
Smart Pen Presentation
susanvo_lavc
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysis
Peter Bouda
 
My Presentation
My PresentationMy Presentation
My Presentation
selamnutting
 
Querying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisQuerying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysis
Peter Bouda
 
Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...
Peter Bouda
 
Transmision
TransmisionTransmision
Transmisionpailooot
 
Sci cafe humangenome&health
Sci cafe humangenome&healthSci cafe humangenome&health
Sci cafe humangenome&health
Toby Rossman
 
Parker catalogue 2012
Parker catalogue 2012Parker catalogue 2012
Parker catalogue 2012PeterRamy
 
Product in theory and practice
Product in theory and practiceProduct in theory and practice
Product in theory and practiceRavi Chandegara
 
RxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScriptRxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScript
Viliam Elischer
 
Data models in Angular 1 & 2
Data models in Angular 1 & 2Data models in Angular 1 & 2
Data models in Angular 1 & 2
Adam Klein
 
Top Secret: Large-Scale SPA
Top Secret: Large-Scale SPATop Secret: Large-Scale SPA
Top Secret: Large-Scale SPA
Anderson Braz
 
Cycling for noobs
Cycling for noobsCycling for noobs
Cycling for noobs
Steve Lee
 
01 - Git vs SVN
01 - Git vs SVN01 - Git vs SVN
01 - Git vs SVN
Edward Goikhman
 
Development By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo EditionDevelopment By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo Edition
Anthony Ferrara
 

Viewers also liked (20)

Best episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developerBest episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developer
 
Smart Pen Presentation
Smart Pen PresentationSmart Pen Presentation
Smart Pen Presentation
 
Noord januari 2013
Noord januari 2013Noord januari 2013
Noord januari 2013
 
Multimiedia project
Multimiedia projectMultimiedia project
Multimiedia project
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysis
 
My Presentation
My PresentationMy Presentation
My Presentation
 
Querying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisQuerying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysis
 
Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...
 
Transmision
TransmisionTransmision
Transmision
 
Sci cafe humangenome&health
Sci cafe humangenome&healthSci cafe humangenome&health
Sci cafe humangenome&health
 
Parker catalogue 2012
Parker catalogue 2012Parker catalogue 2012
Parker catalogue 2012
 
Product in theory and practice
Product in theory and practiceProduct in theory and practice
Product in theory and practice
 
Pompa sentrifugal
Pompa sentrifugalPompa sentrifugal
Pompa sentrifugal
 
RxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScriptRxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScript
 
Data models in Angular 1 & 2
Data models in Angular 1 & 2Data models in Angular 1 & 2
Data models in Angular 1 & 2
 
Top Secret: Large-Scale SPA
Top Secret: Large-Scale SPATop Secret: Large-Scale SPA
Top Secret: Large-Scale SPA
 
Cycling for noobs
Cycling for noobsCycling for noobs
Cycling for noobs
 
01 - Git vs SVN
01 - Git vs SVN01 - Git vs SVN
01 - Git vs SVN
 
Simple testable code
Simple testable codeSimple testable code
Simple testable code
 
Development By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo EditionDevelopment By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo Edition
 

Similar to Poio API: a CLARIN-D curation project for language documentation and language typology

Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
 
Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formats
Vigen Sahakyan
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Neo4j
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stackFliptop
 
.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep JoshiSpiffy
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
DPFManager workshop
DPFManager workshopDPFManager workshop
Getting Started with PHP Extensions
Getting Started with PHP ExtensionsGetting Started with PHP Extensions
Getting Started with PHP Extensions
MichaelBrunoLochemem
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
IndicThreads
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
Talentica Software
 
Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)
lennartkats
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
Jen Stirrup
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
Tim Burks
 

Similar to Poio API: a CLARIN-D curation project for language documentation and language typology (20)

Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formats
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stack
 
.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
DPFManager workshop
DPFManager workshopDPFManager workshop
DPFManager workshop
 
Getting Started with PHP Extensions
Getting Started with PHP ExtensionsGetting Started with PHP Extensions
Getting Started with PHP Extensions
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Poio API: a CLARIN-D curation project for language documentation and language typology

  • 1. Poio API: a CLARIN-D curation project for language documentation and language typology Peter Bouda Centro Interdisciplinar de Documentação Linguística e Social pbouda@cidles.eu
  • 2. Overview ● Existing infrastructure and workflows ● Poio API and CLASS within CLARIN ● GrAF and TEI ● Poio API ● GrAF as pivot structures (IGT) ● GrAF for retro-digitization (Dictionary)
  • 5. LD tools and standards ● Elan: EAF, MPEG, WAV ● Toolbox: TXT, XML, WAV ● Arbil: IMDI/CIMDI („Component MetaData Infrastructure“) ● Praat: XML, WAV ● ... ● No standards for tier hierarchies, tier names or annotation schemes ● Efforts in ISOcat
  • 8. GrAF ● GrAF: Graph Annotation Framework ● ISO 24612: Language resource management - Linguistic annotation framework (LAF) ● Started as stand-off version of XCES ● API and representation as data structures, not a file format ● GrAF/XML as XML representation ● Used for the MASC of the ANC ● Nodes, edges, regions, annotations, feature structures
  • 11. GrAF-XML <node xml:id="words..W-Words..na23"> <link targets="words..W-Words..ra23"/> </node> <region anchors="780 1340" xml:id="words..W-Words..ra23"/> <edge from="utterance..W-Spch..n8" to="words..W-Words..na23" xml:id="ea23"/> <a as="words" label="words" ref="words..W-Words..na23" xml:id="a23"> <fs> <f name="annotation_value">so</f> </fs> </a>
  • 12. Why we use GrAF ● No inline markup ● Radical stand-off approach – Easier to share and manage data – Preferred solution to archive cultural heritage – Ideal for sparse annotations ● Existing code: Java and Python ● API vs. XQuery ● The beauty of annotation graphs
  • 13. Poio API ● Think of GrAF as an assembly language for linguistic annotation; then Poio API is a libray to map from and to higher-level languages ● Subset of GrAF to represent tier based annotation – Interlinear glossed text (IGT) ● Filters and filter chains for search ● Plugin mechanism for file formats – Mapping semantics: tiers and annotations to nodes and edges ● Meta-data for additional information (tier types etc.) ● Efforts to map between TEI and GrAF – Poio API supports IGT, next step is dictionaries and lexica – Retro-digitized dictionary data at University of Marburg are published as GrAF files – We want to publish as TEI
  • 14. A basic converter in Poio API parser = poioapi.io.wikipedia_extractor.Parser("Wikipedia.xml") writer = poioapi.io.graf.Writer() converter = poioapi.io.graf.GrAFConverter(parser, writer) converter.parse() converter.write("Wikipedia.hdr")
  • 15. A parser for CSV files class CsvParser(poioapi.io.graf.BaseParser): def get_root_tiers(self): pass def get_child_tiers_for_tier(self, tier): pass def get_annotations_for_tier(self, tier, annotation_parent=None): pass def tier_has_regions(self, tier): pass def region_for_annotation(self, annotation): pass def get_primary_data(self): pass
  • 17. Example: Analysis of CSV data http://nbviewer.ipython.org/urls/raw.github.com/pbouda/notebooks/master/Diana%2520Hinuq%25
  • 18. Retro-digitization of dictionaries ● From scan to .doc to XML to DB to GrAF ● Radical stand-off approach for unsupervised collaboration ● Dictionaries as cultural heritage texts ● GrAF as primary publication format ● Connectors to brat and TEI
  • 19. Analysis of the data ● Spanish as pivot language, subset of bodypart terms ● Converting GrAF to networkx graph ● Nodes are heads, translations, etc. ● Head and translation connected via edges if they appear in one entry ● Merge of graphs ● Count of paths of length 2 between spanish heads ● Python writes JSON graph, visualized with D3.js
  • 21. Thank you for your attention! pbouda@cidles.eu