1. The document discusses layers of annotation for analyzing biblical Hebrew text, including the text itself, linguistic features, manually or automatically generated analyses, and queries for exegetical search.
2. It provides an overview of the Linguistic Annotation Framework (LAF) for representing annotated text and statistics on the annotation of one Hebrew text, with over 800,000 regions and 1.4 million nodes.
3. The document describes tools for querying the annotated text, including the SHEBANQ system and LAF-Fabric API, and the ability to work with the data in various formats like XML, binary files, and R.
Slides of the Knowledge and Media lecture about Linked Data and Linked Open Data. Presented 19 november 2012. Slides were based on presentations by Victor de Boer and Christophe Guéret
Apertium was launched in 2005 as a platform for RBMT. After a decade of creating MT systems (more than 40 stable languages pairs currently available) it is not anymore just a platform for MT. Its open-source nature has turned Apertium into an extensive language resource base, shared under free licenses, being downloaded hundreds of times a week and receiving thousands of contributions a year. In this presentation we will list and give numbers about what can be currently found in Apertium and what are the benefits that the open-source model for data sharing has brought. We will also discuss what do we lack and what's in the near roadmap of the platform.
Slides of the Knowledge and Media lecture about Linked Data and Linked Open Data. Presented 19 november 2012. Slides were based on presentations by Victor de Boer and Christophe Guéret
Apertium was launched in 2005 as a platform for RBMT. After a decade of creating MT systems (more than 40 stable languages pairs currently available) it is not anymore just a platform for MT. Its open-source nature has turned Apertium into an extensive language resource base, shared under free licenses, being downloaded hundreds of times a week and receiving thousands of contributions a year. In this presentation we will list and give numbers about what can be currently found in Apertium and what are the benefits that the open-source model for data sharing has brought. We will also discuss what do we lack and what's in the near roadmap of the platform.
SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF
Text as Data: processing the Hebrew BibleDirk Roorda
The merits of stand-off markup (LAF) versus inline markup (TEI) for processing text as data. Ideas applied to work with the Hebrew Bible, resulting in tools for researchers and end-users.
LAF-Fabric: a tool to process the ETCBC Hebrew Text Database in Linguistic Annotation Framework.
How researchers in theology and linguistics can create workflows to analyse the text of the Hebrew Bible and extract data for visualization. Those workflows can be written in Python, and run conveniently in the IPython Notebook.
Joint work with Martijn Naaijer (VU University).
With the Hebrew Bible encoded in Linguistic Annotation Framework (LAF-ISO), and with a new LAF processing tool, we demonstrate how you can do practical data analysis. The tool, LAF-Fabric, integrates with the ipython notebook approach. Our example here is lexeme cooccurrence analysis of bible books. For now, the road from data to visualization is more important than the exact visualization.
Hebrew Bible as Data: Laboratory, Sharing, LessonsDirk Roorda
Recently, the Hebrew Bible has been published online as a database. We show what you can do with it, and how to share your results with others. Work by the Amsterdam scholars of the Eep Talstra Centre for Bible and Computer, supported by CLARIN-NL.
2009 PLANETS Vienna - MIXED migration to XMLDirk Roorda
Snapshot of how we thought about migration infrastructure then: PLANETS for the infrastructure, MIXED as a plugin for the tabular data conversion functionality.
Developing a tool for handling text with linguistic annotations. Text-Fabric is meant to support researchers that wnat to contribute portions of the data, and weaves the contributions in into a meaningful whole. Currently, it is primarily meant for working with the Hebrew Bible, based on the ETCBC (Amsterdam) linguistic database.
Conference presentation for 2016 annual meeting of the Society of Biblical Literature, San Antonio. (https://www.sbl-site.org).
Authors: Janet Dyk (linguistic ideas) and Dirk Roorda (computational implementation).
A verb organizes the elements in a sentence. Different patterns of constituents affect the meaning of a verb in a given context. The potential of a verb to combine with patterns of elements is known as its valence. A single set of questions, organized as a flow chart, selects the relevant building blocks within the context of a verb. The resulting pattern provides a particular significance for the verb in question. Because all contexts are submitted to the same flow chart, similarities and differences between verbs come to light. For example, verbs of movement in their causative formation manifest the same patterns as transitive verbs with an object that gets moved. We apply this approach to the whole Hebrew Bible, using the database of the Eep Talstra Centre for Bible and Computer (ETCBC), which contains the relevant linguistic annotations. This allows us to have a complete listing of all patterns for all verbs. It provides the basis for consistent proposals for the significance of specific patterns occurring with a particular verb. The valence results are made available in SHEBANQ, an online research tool based on the ETCBC database. It presents the basic data, text and linguistic features, together with annotations by researchers. The valence results consist of a set of algorithmically generated annotations which show up between the lines of the text. The algorithm itself and its documentation can be found at https://shebanq.ancient-data.org/tools?goto=valence. By using SHEBANQ we achieve several goals with respect to the scholarly workflow: (1) all our results are openly accessible online, and other researchers may comment on them; (2) all resources needed to reproduce this research are available online and can be downloaded (Open Access).
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Digital History
Digital History seminar and Archives and Society seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the world of natural language - unstructured data that by its very nature has important latent information for humans. NLP practitioners have benefitted from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python, the Natural Language Toolkit (NLTK), and to a lesser extent, the Gensim Library.
NLTK is an excellent library for machine learning-based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. Gensim provides vector-based topic modeling, which is currently absent in both NLTK and Scikit-Learn. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.
SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF
Text as Data: processing the Hebrew BibleDirk Roorda
The merits of stand-off markup (LAF) versus inline markup (TEI) for processing text as data. Ideas applied to work with the Hebrew Bible, resulting in tools for researchers and end-users.
LAF-Fabric: a tool to process the ETCBC Hebrew Text Database in Linguistic Annotation Framework.
How researchers in theology and linguistics can create workflows to analyse the text of the Hebrew Bible and extract data for visualization. Those workflows can be written in Python, and run conveniently in the IPython Notebook.
Joint work with Martijn Naaijer (VU University).
With the Hebrew Bible encoded in Linguistic Annotation Framework (LAF-ISO), and with a new LAF processing tool, we demonstrate how you can do practical data analysis. The tool, LAF-Fabric, integrates with the ipython notebook approach. Our example here is lexeme cooccurrence analysis of bible books. For now, the road from data to visualization is more important than the exact visualization.
Hebrew Bible as Data: Laboratory, Sharing, LessonsDirk Roorda
Recently, the Hebrew Bible has been published online as a database. We show what you can do with it, and how to share your results with others. Work by the Amsterdam scholars of the Eep Talstra Centre for Bible and Computer, supported by CLARIN-NL.
2009 PLANETS Vienna - MIXED migration to XMLDirk Roorda
Snapshot of how we thought about migration infrastructure then: PLANETS for the infrastructure, MIXED as a plugin for the tabular data conversion functionality.
Developing a tool for handling text with linguistic annotations. Text-Fabric is meant to support researchers that wnat to contribute portions of the data, and weaves the contributions in into a meaningful whole. Currently, it is primarily meant for working with the Hebrew Bible, based on the ETCBC (Amsterdam) linguistic database.
Conference presentation for 2016 annual meeting of the Society of Biblical Literature, San Antonio. (https://www.sbl-site.org).
Authors: Janet Dyk (linguistic ideas) and Dirk Roorda (computational implementation).
A verb organizes the elements in a sentence. Different patterns of constituents affect the meaning of a verb in a given context. The potential of a verb to combine with patterns of elements is known as its valence. A single set of questions, organized as a flow chart, selects the relevant building blocks within the context of a verb. The resulting pattern provides a particular significance for the verb in question. Because all contexts are submitted to the same flow chart, similarities and differences between verbs come to light. For example, verbs of movement in their causative formation manifest the same patterns as transitive verbs with an object that gets moved. We apply this approach to the whole Hebrew Bible, using the database of the Eep Talstra Centre for Bible and Computer (ETCBC), which contains the relevant linguistic annotations. This allows us to have a complete listing of all patterns for all verbs. It provides the basis for consistent proposals for the significance of specific patterns occurring with a particular verb. The valence results are made available in SHEBANQ, an online research tool based on the ETCBC database. It presents the basic data, text and linguistic features, together with annotations by researchers. The valence results consist of a set of algorithmically generated annotations which show up between the lines of the text. The algorithm itself and its documentation can be found at https://shebanq.ancient-data.org/tools?goto=valence. By using SHEBANQ we achieve several goals with respect to the scholarly workflow: (1) all our results are openly accessible online, and other researchers may comment on them; (2) all resources needed to reproduce this research are available online and can be downloaded (Open Access).
Sarah Rees Jones (York) and Helen Petrie: 'Chartex overview and next steps' Digital History
Digital History seminar and Archives and Society seminar
Institute of Historical Research
23 June 2015
http://ihrdighist.blogs.sas.ac.uk/2015/06/15/23-june-2015-exploring-big-and-small-historical-datasets-reflections-on-two-recent-projects/
Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the world of natural language - unstructured data that by its very nature has important latent information for humans. NLP practitioners have benefitted from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python, the Natural Language Toolkit (NLTK), and to a lesser extent, the Gensim Library.
NLTK is an excellent library for machine learning-based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. Gensim provides vector-based topic modeling, which is currently absent in both NLTK and Scikit-Learn. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.
Chcete vědět víc? Mnoho dalších prezentací, videí z konferencí, fotografií i jiných dokumentů je k dispozici v institucionálním repozitáři NTK: http://repozitar.techlib.cz
Would you like to know more? Find presentations, reports, conference videos, photos and much more in our institutional repository at: http://repozitar.techlib.cz/?ln=en
Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...onthewight
Professor John Coleman from the Phonetics Department at Oxford University presenting his talk "Voices from the Past" to the Isle of Wight Cafe Scientifique.
He discusses, how do present day languages sound compared to those spoken by our ancestors? An audio journey into the spoken words of the past.
Discover the deep cultural connections we share with our linguistic cousins across Europe and Asia and hear reconstructions of ancient words, last spoken over 6,000 years ago.
Getting started on your natural language processing project? First you'll need to extract some features from your corpus. Frequency, Syntax parsing, word vectors are good ones to start with.
A brief literature review on language-agnostic tokenization, covering state of the art algorithms: BPE and Unigram model. This slide is part of a weekly sharing activity.
Text-Fabric: how to do text research in a FAIR way.
Text is one of the simplest and most common data types in computer science.
But there is a lot in text that does not meet the eye, and so people have been annotating texts, century-by-century.
When you research texts, you consume and produce such annotations.
Suddenly you find yourself in the midst of a big fabric of thoughts, contributed by many authors.
Text-Fabric is a tool that helps you to follow the threads that came before you and to weave a few of your own and add them to the scholarly record.
I'll show you how that looks for clay tablets of the Uruk period (the oldest writing on earth), the much more recent Hebrew Bible, and the ultramodern General Missives of the VOC time.
Towards TextPy, a module for processing text.
If we define annotated text as a graph with additional structure, we can make text processing more efficient, in the same way that Pandas makes processing dataframes more efficient.
We demonstrate how Text-Fabric can handle the display of text and annotations, even when chunks of text are not properly embedded in each other. This demo contains examples from the Hebrew Bible and the Old Babylonian Letters (cuneiform clay tablets).
Researchers in ancient text corpora can take control over their data. We show a way to do so by means of Text-Fabric.
Co-production of Cody Kingham and Dirk Roorda
Biblia Hebraica Stuttgartensia Amstelodamensis. Coding the Hebrew Bible with an Open Science ethos: Text-Fabric.
Text-Fabric is several things: (1) a browser for ancient text corpora; (2) a Python3 package for processing ancient corpora
A corpus of ancient texts and linguistic annotations represents a large body of knowledge. Text-Fabric makes that knowledge accessible to non-programmers by means of built-in a search interface that runs in your browser.
From there the step to program your own analytics is not so big anymore. Because you can call the Text-Fabric API from your Python programs, and it works really well in Jupyter notebooks.
Datamanagement for Research: A Case StudyDirk Roorda
How practices of data sharing can help researchers to produce more science.
Session in the data management course organized by RDNL (Research Data in the Netherlands)
Save queries as annotations. A method for the digital preservation of queries on a Hebrew Text database with linguistic information in it. These queries form the data for interpretations by biblical scholars. Sharing those queries as Open Annotation enables researchers to communicate their (intermediate) results.
2007 iPres Beijing - MIXED: Preservation by migration to XMLDirk Roorda
File formats for tabular data are often proprietary. By creating conversions to and from XML we can preserve the tabular information over time, even when the proprietary formats become obsolete.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
10. 1. The Text itself (representations)
2. Linguistics (feature structures)
3. "Manual" (really manual or software-generated)
4. Queries (exegetical search)
layers of annotation
11. A text is ... in abstracto:
a sequence of objects with a notion of embedding
1994 Crist-Jan Doedens.
Text Databases.
One Database Model and Several
Retrieval Languages.
Ph.D. thesis. In Language and
Computers, Amsterdam.
See Google Books.
12. ... in concreto:
all objects are sets of monads (the smallest elements)
all objects participate in spatial relationships
sequence - embedding - overlap - gap
all objects can carry unlimited features
a representation of a word is just a feature
13. ... in practice:
this model has been implemented in a mature system
2002-2014 Ulrik Petersen.
Emdros.
Text database engine for storage and
retrieval of analyzed or annotated text.
Open Source Software.
See COLING paper 2004.
14. ... for convenience:
an ISO standard captures a lot of this model
2012 Nancy Ide and Laurent Romary.
Linguistic Annotation Framework
(LAF).
ISO Standard 24612.
15. 1. The Text itself (representations)
2. Linguistics (feature structures)
3. "Manual" (really manual or software-generated)
4. Queries (exegetical search)
layers of annotation
17. LAF from the outside
dirk:~/SURFdrive/laf-fabric-data/etcbc4b/laf > ls -lh
total 3195648
-rw-r--r-- 1 dirk staff 14K May 4 15:20 etcbc4b.hdr
-rw-r--r-- 1 dirk staff 12M May 4 15:08 etcbc4b.lst
-rw-r--r-- 1 dirk staff 5.1M May 4 15:08 etcbc4b.txt
-rw-r--r-- 1 dirk staff 1.6K May 4 15:20 etcbc4b.txt.hdr
-rw-r--r-- 1 dirk staff 106M May 4 15:09 etcbc4b_lingo.c.xml
-rw-r--r-- 1 dirk staff 107M May 4 15:09 etcbc4b_lingo.p.xml
-rw-r--r-- 1 dirk staff 148M May 4 15:09 etcbc4b_lingo.pa.xml
-rw-r--r-- 1 dirk staff 21M May 4 15:09 etcbc4b_lingo.s.xml
-rw-r--r-- 1 dirk staff 23M May 4 15:09 etcbc4b_lingo.sp.xml
-rw-r--r-- 1 dirk staff 298M May 4 15:09 etcbc4b_lingo.xml
-rw-r--r-- 1 dirk staff 642M May 4 15:08 etcbc4b_monads.lex.xml
-rw-r--r-- 1 dirk staff 125M May 4 15:08 etcbc4b_monads.xml
-rw-r--r-- 1 dirk staff 37M May 4 15:08 etcbc4b_regions.xml
-rw-r--r-- 1 dirk staff 36M May 4 15:08 etcbc4b_sections.xml
dirk:~/SURFdrive/laf-fabric-data/etcbc4b/laf > du -d1 -h
1.5G .
dirk:~/SURFdrive/laf-fabric-data/etcbc4b/laf >
18. LAF statistics
OPENED AT:2015-06-29T05-20-29
0.00s PARSING ANNOTATION FILES
8m 24s INFO: END PARSING
800,607 regions
1,437,355 nodes
2,223,873 edges
5,029,354 annots
30,757,007 features
9,491,189 distinct xml identifiers
8m 24s MODELING RESULT FILES
9m 36s WRITING RESULT FILES for m
CLOSED AT:2015-06-29T05-30-49
28. The ETCBC database
Eep Talstra 197?-2015
Wido van Peursen 2013-
Constantijn Sikkel
Janet Dyk
Reinoud Oosting
Oliver Glanz
...
2012 Eep Talstra, Constantijn Sikkel, Oliver Glanz,
Reinoud Oosting, and Janet Dyk: Text database of the
Hebrew Bible. DOI 10.17026/dans-x8h-y2bv. Restricted
Access.
2014 Wido van Peursen, Eep Talstra, Constantijn
Sikkel, Janet Dyk, Oliver Glanz, Reinoud Oosting, Gino
Kalkman and Dirk Roorda: Hebrew Text Database
ETCBC4. DOI 10.17026/dans-2z3-arxf. Open Acces
(CC-BY NC)
2015 Wido van Peursen, Constantijn Sikkel and Dirk
Roorda: Hebrew Text Database ETCBC4b. DOI:
10.17026/dans-z6y-skyh Open Acces (CC-BY NC)
archived!
4
3
4b
4s to come
29. 1. The Text itself (representations)
2. Linguistics (feature structures)
3. "Manual" (really manual or software-generated)
4. Queries (exegetical search)
layers of annotation
30. Parallel Passages
Work with Martijn Naaijer (ETCBC)
on historical linguistic variation.
See parallel on shebanq.
31.
32. Verbal Valency
Work with Janet Dyk (ETCBC)
on verbal semantics.
See valence on shebanq.
clause
has
complement
has direct
objectנתן
has
another direct
object
clause
has
complement
complement
is indirect
object
complement
is locative
determine
primary object
and secondary
objects (pdo,
sdos)
(act of) producing; yielding;
giving (in itself)
produce; yield; give
produce, yield, give
give
give ?or? place
place
make {pdo} (to be (as)/to
become/to do) {sdos}
00
0c
10
1i
1c
1l
2
33.
34. 1. The Text itself (representations)
2. Linguistics (feature structures)
3. "Manual" (really manual or software-generated)
4. Queries (exegetical search)
layers of annotation
36. SHEBANQ
System for HEBrew text:
ANnotations for Queries and Markup
Giving it to the users:
readers
language scholars
historical linguists
computational linguists
exegetes-hermeneutes
bible translators
publishers
computer scientists