Poio API is an open source software library written in Python and is being developed as part of a curation project within the working group “Linguistic Fieldwork, Anthropology, Language Typology” of CLARIN-D . The goal of Poio API is to provide unified access to pivot data structures parsed from different file formats that researchers use in language documentation projects. As unified data structures we chose an implementation of the “Graph Annotation Framework” (GrAF) that was standardized as ISO 24612 in 2012. In our presentation, we will discuss the connections between GrAF and TEI, and present two use cases that demonstrate the innovation and advantage of our approach in comparison to existing methods.
Language documentation projects all over the world have accumulated a large and heterogeneous corpus of linguistic material. Because of its diversity, access to and analysis of the components is difficult, particularly for multimedia instances. The "Graph Annotation Framework" (GrAF), a standoff annotation method, is applied to utterance examples in time-aligned annotations of video samples. An easy-to-use programming interface defined in the Poio API, a project within the CLARIN framwork ("Common Language Resources and Technology Infrastructure"), then greatly simplifies access without the need to deal with multiple input formats in the source material. GrAF-XML provides a basis for exchanging results among the various projects that analyze the corpus.
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
Introduction to R language, How to install R and R studio
How to import dataset in R
How to explore data in R
www.simpleresearch.net
info@simpleresearch.net
Basic tutorial for R programming. this video contains lot of information about r programming like
agenda
history
SOFTWARE PARADIGM
R interface
advantages of r
drawbacks of r
Language documentation projects all over the world have accumulated a large and heterogeneous corpus of linguistic material. Because of its diversity, access to and analysis of the components is difficult, particularly for multimedia instances. The "Graph Annotation Framework" (GrAF), a standoff annotation method, is applied to utterance examples in time-aligned annotations of video samples. An easy-to-use programming interface defined in the Poio API, a project within the CLARIN framwork ("Common Language Resources and Technology Infrastructure"), then greatly simplifies access without the need to deal with multiple input formats in the source material. GrAF-XML provides a basis for exchanging results among the various projects that analyze the corpus.
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
Introduction to R language, How to install R and R studio
How to import dataset in R
How to explore data in R
www.simpleresearch.net
info@simpleresearch.net
Basic tutorial for R programming. this video contains lot of information about r programming like
agenda
history
SOFTWARE PARADIGM
R interface
advantages of r
drawbacks of r
R is a programming language and software environment for statistical analysis, graphics representation and reporting. Are You Interested to Learning R Programming in Best Institute Join Besant Technologies in Bangalore.
Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
It is one of the Best Presentation on the topic "R Programming" having interesting Slides consisting of Amazing Images & Very Useful Information. It also have Transitions & Animation which makes the Presentation more Interesting & Attractive.
Created By - Abhishek Pratap Singh (Aps)
Very brief introduction to R software that I have presented at UNISZA. No R codes and No Statistical Contents. Basically for those who just heard about R software for the first time
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
R programming language: conceptual overviewMaxim Litvak
This is an advanced overview of the programming language R showing the concepts and paradigms used there.
Target audience: R programmers who want to see the big picture and software engineers who have a broad experience in other technologies and want to grasp how R is designed.
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...GUANGYUAN PIAO
With the popularity of Linked Open Data (LOD) and the associated rise in freely accessible knowledge that can be accessed via LOD, exploiting LOD for recommender systems has been widely studied based on various approaches such as graph-based or using different ma- chine learning models with LOD-enabled features. Many of the previous approaches require construction of an additional graph to run graph- based algorithms or to extract path-based features by combining user- item interactions (e.g., likes, dislikes) and background knowledge from LOD. In this paper, we investigate Factorization Machines (FMs) based on particularly lightweight LOD-enabled features which can be directly obtained via a public SPARQL Endpoint without any additional effort to construct a graph. Firstly, we aim to study whether using FM with these lightweight LOD-enabled features can provide competitive perfor- mance compared to a learning-to-rank approach leveraging LOD as well as other well-established approaches such as kNN-item and BPRMF. Secondly, we are interested in finding out to what extent each set of LOD-enabled features contributes to the recommendation performance. Experimental evaluation on a standard dataset shows that our proposed approach using FM with lightweight LOD-enabled features provides the best performance compared to other approaches in terms of five evalua- tion metrics. In addition, the study of the recommendation performance based on different sets of LOD-enabled features indicate that property- object lists and PageRank scores of items are useful for improving the performance, and can provide the best performance through using them together for FM. We observe that subject-property lists of items does not contribute to the recommendation performance but rather decreases the performance.
This short text will get you up to speed in no time on creating visualizations using R's ggplot2 package. It was developed as part of a training to those who had no prior experience in R and had limited knowledge on general programming concepts. It's a must have initial guide for those exploring the field of Data Science
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
Best episode ever: Angular 2 from the perspective of an Angular 1 developerPeter Bouda
These are the slides of a talk I gave at the AngularJS Portugal Meetup in Lisbon on May 17th 2016. I talk about the migration path from Angular 1 to Angular 2, based on a little demo application. No TypeScript here, everything is ES6.
How can we bring the lectures from our classrooms to our students’ homes? One possible solution is to use the smart pen. The presentation will show how to record such pencast lectures, how to upload them to the portal or embed them in any classroom management system like Etudes, and the possibility to have the lectures transformed to videos with close captions.
R is a programming language and software environment for statistical analysis, graphics representation and reporting. Are You Interested to Learning R Programming in Best Institute Join Besant Technologies in Bangalore.
Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
It is one of the Best Presentation on the topic "R Programming" having interesting Slides consisting of Amazing Images & Very Useful Information. It also have Transitions & Animation which makes the Presentation more Interesting & Attractive.
Created By - Abhishek Pratap Singh (Aps)
Very brief introduction to R software that I have presented at UNISZA. No R codes and No Statistical Contents. Basically for those who just heard about R software for the first time
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
R programming language: conceptual overviewMaxim Litvak
This is an advanced overview of the programming language R showing the concepts and paradigms used there.
Target audience: R programmers who want to see the big picture and software engineers who have a broad experience in other technologies and want to grasp how R is designed.
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...GUANGYUAN PIAO
With the popularity of Linked Open Data (LOD) and the associated rise in freely accessible knowledge that can be accessed via LOD, exploiting LOD for recommender systems has been widely studied based on various approaches such as graph-based or using different ma- chine learning models with LOD-enabled features. Many of the previous approaches require construction of an additional graph to run graph- based algorithms or to extract path-based features by combining user- item interactions (e.g., likes, dislikes) and background knowledge from LOD. In this paper, we investigate Factorization Machines (FMs) based on particularly lightweight LOD-enabled features which can be directly obtained via a public SPARQL Endpoint without any additional effort to construct a graph. Firstly, we aim to study whether using FM with these lightweight LOD-enabled features can provide competitive perfor- mance compared to a learning-to-rank approach leveraging LOD as well as other well-established approaches such as kNN-item and BPRMF. Secondly, we are interested in finding out to what extent each set of LOD-enabled features contributes to the recommendation performance. Experimental evaluation on a standard dataset shows that our proposed approach using FM with lightweight LOD-enabled features provides the best performance compared to other approaches in terms of five evalua- tion metrics. In addition, the study of the recommendation performance based on different sets of LOD-enabled features indicate that property- object lists and PageRank scores of items are useful for improving the performance, and can provide the best performance through using them together for FM. We observe that subject-property lists of items does not contribute to the recommendation performance but rather decreases the performance.
This short text will get you up to speed in no time on creating visualizations using R's ggplot2 package. It was developed as part of a training to those who had no prior experience in R and had limited knowledge on general programming concepts. It's a must have initial guide for those exploring the field of Data Science
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
Best episode ever: Angular 2 from the perspective of an Angular 1 developerPeter Bouda
These are the slides of a talk I gave at the AngularJS Portugal Meetup in Lisbon on May 17th 2016. I talk about the migration path from Angular 1 to Angular 2, based on a little demo application. No TypeScript here, everything is ES6.
How can we bring the lectures from our classrooms to our students’ homes? One possible solution is to use the smart pen. The presentation will show how to record such pencast lectures, how to upload them to the portal or embed them in any classroom management system like Etudes, and the possibility to have the lectures transformed to videos with close captions.
How community software supports language documentation and data analysisPeter Bouda
Field linguists have increasingly adopted the latest technologies and tools for language documentation. Their needs have led to remarkable developments in software and archiving, exemplified by work at the MPI in Nijmegen, which leads the innovation cycles that take place in the digital working environments of field linguists. The next step in research is now the analysis and theoretical exploitation of the huge amount of data that has been collected in numerous language documentation projects that use these environments. This research will also rely on computer-based strategies, as data is instantly available in digital formats.
In this talk I will introduce some of the lesser known tools and software packages for annotation and analysis tasks. Some of these tools were created within DOBES projects and/or as community projects by small teams; they can be combined with well-known tools like ELAN or Toolbox to give researchers access to their data. I will focus on how a combination of simple, special purpose tools makes researchers more productive and how existing software libraries allow scientific projects to create their own, task-specific software tools that they can tailor to their own needs.
Querying GrAF data in linguistic analysisPeter Bouda
The “Graph Annotation Framework” (GrAF) defines an API and an XML format to store and query linguistic annotations as annotation graphs. The format was standardized as ISO 24612 in 20121, and was explicitly developed as an underlying data model for linguistic annotations in a radical stand-off approach2 ([Ide and Suderman 2007]). The basic data structures are annotation graphs as proposed in [Bird and Liberman 2001], and are general and expressive enough to encode all known varieties of annotation in linguistics and other “annotation-based” disciplines. Although GrAF is not a TEI-compatible format, both standards share a certain technological foundation and grew in a similar ecosystem, but with slightly different applications in mind. In our talk we will show the connections between TEI and GrAF, propose an option to convert between the „two worlds“, and demonstrate a query system for GrAF data that we already use in typological analysis of annotated data from language documentation projects.
Poio API - An annotation framework to bridge Language Documentation and Natur...Peter Bouda
After 20 years of multimedia data collection from endangered languages and consequent creation of extensive corpora with large amounts of annotated linguistic data, a new trend in Language Documentation is now observable. It can be described as a shift from data collection and qualitative language analysis to quantitative language comparison based on the data previously collected. However, the heterogeneous annotation types and formats in the corpora hinder the application of new developed computational methods in their analysis. A standardized representation is needed. Poio API, a scientific software library written in Python and based on Linguistic Annotation Framework, fulfills this need and establishes the bridge between Language Documentation and Natural Language Processing (NLP). Hence, it represents an innovative approach which will open up new options in interdisciplinary collaborative linguistic research. This paper offers a contextualization of Poio API in the framework of current linguistic and NLP research as well as a description of its development.
In this talk, we'll explore some of the tools available for measuring software quality. We'll dive into some of the theory behind the metrics that they analyze while looking at some real world applications of those metrics. We'll also explore how to use these tools to gain valuable insight into legacy codebases.
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
Technical deep dive for database system developers in the Arrow columnar format, binary protocol, C++ development platform, and Arrow Flight RPC.
See demo Jupyter notebooks at https://github.com/wesm/vldb-2019-apache-arrow-workshop
DPFManager workshop in Barcelona with members of the Official Association of at the Librarians (Col·legi Oficial de Bibliotecaris i Documentalistes de Catalunya – COBDC), to show the functionalities offered by the DPF Manager to check TIFF files.
Standardizing on a single N-dimensional array API for PythonRalf Gommers
MXNet workshop Dec 2020 presentation on the array API standardization effort ongoing in the Consortium for Python Data API Standards - see data-apis.org
Building scalable and language independent java services using apache thriftTalentica Software
This presentation is about the key challenges of cross language interactions and how they can be overcome. We discuss the Apache Thrift as a solution and understand its principle of Operation with code snippets and examples.
Using Aspects for Language Portability (SCAM 2010)lennartkats
Software platforms such as the Java Virtual Machine or the CLR .NET virtual machine have their own ecosystem of a core programming language or instruction set, libraries, and developer community. Programming languages can target multiple software platforms to increase interoperability or to boost performance. Introducing a new compiler backend for a language is the first step towards targeting a new platform, translating the language to the platform\'s language or instruction set. Programs written in modern languages generally make extensive use of APIs, based on the runtime system of the software platform, introducing additional portability concerns. They may use APIs that are implemented by platform-specific libraries. Libraries may perform platform-specific operations, make direct native calls, or make assumptions about performance characteristics of operations or about the file system. This paper proposes to use aspect weaving to invasively adapt programs and libraries to address such portability concerns, and identifies four classes of aspects for this purpose. We evaluate this approach through a case study where we retarget the Stratego program transformation language towards the Java Virtual Machine.
Introduction to Analytics with Azure Notebooks and PythonJen Stirrup
Introduction to Analytics with Azure Notebooks and Python for Data Science and Business Intelligence. This is one part of a full day workshop on moving from BI to Analytics
Enforcing API Design Rules for High Quality Code GenerationTim Burks
[Co-presented with Mike Kistler, Architect for SDK Generation for the Watson Client Libraries]
The OpenAPI Specification is emerging as the leading standard for describing REST APIs. A key factor in the popularity of OpenAPI is the broad array of open source tools that it enables that create, manipulate, and publish documentation and code from OpenAPI descriptions. In this talk, we describe a configurable and extensible open source linter for OpenAPI that we are using to solve API code generation problems at IBM and Google. Our linter is based on Gnostic, an open source framework for working with API descriptions that was developed at Google and is available on GitHub.
OpenAPI itself is language-agnostic and is being used to generate code in a large set of popular programming languages. This generated code includes both server-side "stubs" and client libraries that are sometimes called software development kits (SDKs). IBM has begun to employ code generation for the Watson Developer Cloud SDKs and other companies are doing similar things, including Google, which generates client libraries from Google-specific API description formats. These teams have found that the quality of SDKs generated from API descriptions depends heavily on the quality of the descriptions. This goes far beyond mere syntactic compliance with a specification -- it involves proper API design, naming, and adherence to organization-wide design patterns. To address this, many companies have created API design guides. Some companies, such as Google and Microsoft, have published their API design guides externally, while others like IBM have kept theirs as internal documents. But to this point, verifying compliance with an API design guide has largely been a manual task. What is needed, we believe, is a configurable and extensible linter to check OpenAPI descriptions for conformance with rules derived from API design guides.
Similar to Poio API: a CLARIN-D curation project for language documentation and language typology (20)
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
How world-class product teams are winning in the AI era by CEO and Founder, P...
Poio API: a CLARIN-D curation project for language documentation and language typology
1. Poio API: a CLARIN-D curation project for
language documentation and language typology
Peter Bouda
Centro Interdisciplinar de Documentação Linguística e Social
pbouda@cidles.eu
2. Overview
● Existing infrastructure and workflows
● Poio API and CLASS within CLARIN
● GrAF and TEI
● Poio API
● GrAF as pivot structures (IGT)
● GrAF for retro-digitization (Dictionary)
8. GrAF
● GrAF: Graph Annotation Framework
● ISO 24612: Language resource management - Linguistic
annotation framework (LAF)
● Started as stand-off version of XCES
● API and representation as data structures, not a file format
● GrAF/XML as XML representation
● Used for the MASC of the ANC
● Nodes, edges, regions, annotations, feature structures
12. Why we use GrAF
● No inline markup
● Radical stand-off approach
– Easier to share and manage data
– Preferred solution to archive cultural heritage
– Ideal for sparse annotations
● Existing code: Java and Python
● API vs. XQuery
● The beauty of annotation graphs
13. Poio API
●
Think of GrAF as an assembly language for linguistic annotation; then
Poio API is a libray to map from and to higher-level languages
● Subset of GrAF to represent tier based annotation
– Interlinear glossed text (IGT)
● Filters and filter chains for search
● Plugin mechanism for file formats
– Mapping semantics: tiers and annotations to nodes and edges
● Meta-data for additional information (tier types etc.)
● Efforts to map between TEI and GrAF
– Poio API supports IGT, next step is dictionaries and lexica
– Retro-digitized dictionary data at University of Marburg are published as GrAF files
– We want to publish as TEI
14. A basic converter in Poio API
parser = poioapi.io.wikipedia_extractor.Parser("Wikipedia.xml")
writer = poioapi.io.graf.Writer()
converter = poioapi.io.graf.GrAFConverter(parser, writer)
converter.parse()
converter.write("Wikipedia.hdr")
17. Example: Analysis of CSV data
http://nbviewer.ipython.org/urls/raw.github.com/pbouda/notebooks/master/Diana%2520Hinuq%25
18. Retro-digitization of dictionaries
● From scan to .doc to XML to DB to GrAF
● Radical stand-off approach for unsupervised
collaboration
● Dictionaries as cultural heritage texts
● GrAF as primary publication format
● Connectors to brat and TEI
19. Analysis of the data
● Spanish as pivot language, subset of bodypart terms
● Converting GrAF to networkx graph
● Nodes are heads, translations, etc.
● Head and translation connected via edges if they appear
in one entry
● Merge of graphs
● Count of paths of length 2 between spanish heads
● Python writes JSON graph, visualized with D3.js