This document discusses applying data analysis techniques used for ancient corpora to the Quran. It presents Text-Fabric (TF) as a graph database model for storing textual data in plain text files without XML or SQL. TF models text as nodes for words and phrases connected by edge relationships, and stores components like words, phrases, chapters and verses that can be uniquely identified. The document provides an example of a TF dataset containing parsed text from Iain M. Banks' novel "Consider Phlebas".
The world is y0ur$: Geolocation-based wordlist generation with wordsmithSanjiv Kawa
Wordsmith is a tool for custom wordlist generation tailored for password cracking and penetration testing. It generates wordlists based on geolocation data, including cities, streets, landmarks and other location-specific information. The latest version (Wordsmith v2) expands on the original by including over 230 country datasets, support for 13 languages, automated username generation, and an extensible modular framework. It aims to produce targeted wordlists that leverage geolocation intelligence for password attacks and assessments.
Researchers in ancient text corpora can take control over their data. We show a way to do so by means of Text-Fabric.
Co-production of Cody Kingham and Dirk Roorda
This document discusses expectations and challenges when visualizing data. The key points are:
1. Expect to find the real need by understanding the audience and goals better than the client. Expect to clean data, which can take a significant amount of time due to multiple sources and formats.
2. Prepare to iterate as the initial visualization may not meet needs or deadlines. Celebrate failures as learning opportunities.
3. Visualization projects include storytelling projects with strict deadlines and analytical tools to support data exploration by technical teams over the long term. The project lifecycle involves identifying needs, prototyping, refining, and maintaining the visualization.
A talk given at the August 2010 meeting of the Linux Users of Victoria. About using their mailing list of some 20,000 messages (since the start of 2007) with over 2 million words, as a demonstration of using a web corpus in NLTK (Natural Language Tool Kit), the Python library.
RDA implementation is scheduled for March 31, 2013. Testers of RDA recommended improvements like rewriting instructions in plain English and ensuring community involvement. Differences from AACR2 include lack of abbreviations, more transcription of what is seen, and new fields in MARC like 336, 337, 338 for content/media/carrier types. Linked data and semantic web approaches may make relationships between works more explicit over time. Preparing for RDA involves decisions about cataloging workflows and training.
This document summarizes the key expectations and challenges when visualizing data or building visual analytics tools. There are several main points:
1. Expect potential mismatches between what clients think they need versus what the data and visualization actually require, requiring clear communication and compromise.
2. Different projects will have different goals that require flexibility in the types of visualizations created, whether for presentation, exploration, or both.
3. A significant amount of time, often 70-80%, will be spent cleaning and preparing data prior to visualization due to issues like missing values, formatting inconsistencies, and data quality problems.
4. Iteration is essential to work out bugs and refine visualizations to best meet requirements and dead
The world is y0ur$: Geolocation-based wordlist generation with wordsmithSanjiv Kawa
Wordsmith is a tool for custom wordlist generation tailored for password cracking and penetration testing. It generates wordlists based on geolocation data, including cities, streets, landmarks and other location-specific information. The latest version (Wordsmith v2) expands on the original by including over 230 country datasets, support for 13 languages, automated username generation, and an extensible modular framework. It aims to produce targeted wordlists that leverage geolocation intelligence for password attacks and assessments.
Researchers in ancient text corpora can take control over their data. We show a way to do so by means of Text-Fabric.
Co-production of Cody Kingham and Dirk Roorda
This document discusses expectations and challenges when visualizing data. The key points are:
1. Expect to find the real need by understanding the audience and goals better than the client. Expect to clean data, which can take a significant amount of time due to multiple sources and formats.
2. Prepare to iterate as the initial visualization may not meet needs or deadlines. Celebrate failures as learning opportunities.
3. Visualization projects include storytelling projects with strict deadlines and analytical tools to support data exploration by technical teams over the long term. The project lifecycle involves identifying needs, prototyping, refining, and maintaining the visualization.
A talk given at the August 2010 meeting of the Linux Users of Victoria. About using their mailing list of some 20,000 messages (since the start of 2007) with over 2 million words, as a demonstration of using a web corpus in NLTK (Natural Language Tool Kit), the Python library.
RDA implementation is scheduled for March 31, 2013. Testers of RDA recommended improvements like rewriting instructions in plain English and ensuring community involvement. Differences from AACR2 include lack of abbreviations, more transcription of what is seen, and new fields in MARC like 336, 337, 338 for content/media/carrier types. Linked data and semantic web approaches may make relationships between works more explicit over time. Preparing for RDA involves decisions about cataloging workflows and training.
This document summarizes the key expectations and challenges when visualizing data or building visual analytics tools. There are several main points:
1. Expect potential mismatches between what clients think they need versus what the data and visualization actually require, requiring clear communication and compromise.
2. Different projects will have different goals that require flexibility in the types of visualizations created, whether for presentation, exploration, or both.
3. A significant amount of time, often 70-80%, will be spent cleaning and preparing data prior to visualization due to issues like missing values, formatting inconsistencies, and data quality problems.
4. Iteration is essential to work out bugs and refine visualizations to best meet requirements and dead
Convolutional Neural Networks and Natural Language ProcessingThomas Delteil
Presentation on Convolutional Neural Networks and their application to Natural Language Processing. In-depth walk-through the Crepe architecture from Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
Loosely based on ODSC London 2016 talk: https://www.slideshare.net/MiguelFierro1/deep-learning-for-nlp-67182819
Code: https://github.com/ThomasDelteil/TextClassificationCNNs_MXNet
Demo: https://thomasdelteil.github.io/TextClassificationCNNs_MXNet/
(flattened pdf, no animation, email author for .pptx)
This document discusses expectations when visualizing data and creating visualizations. It covers 6 main points:
1. Expect to find the real need by understanding audience goals, questions, and intended use of the visualization. Compromise may be needed.
2. Expect to spend significant time (70-80%) cleaning data due to issues like multiple data sources and formats, missing values, and errors.
3. Expect trials and errors in the prototyping process to solve problems and meet deadlines. Iteration is important.
4. For larger datasets, expect challenges in processing, analyzing, and reducing size to find relevant insights. Tools like Hadoop can help handle bigger data.
5.
This document discusses using topic modeling on security reports to gain insights. It describes the process used:
1) Building a corpus from an APT notes repository
2) Cleaning and normalizing the data, including removing stop words and stemming
3) Using latent semantic analysis to analyze the documents and find underlying concepts by decomposing the document-term matrix
4) Calculating cosine similarity between document vectors to recommend similar documents for a given query
5) Graphing the results for visualization
Topic modeling is used to automatically analyze large collections of text and summarize key concepts without manual sorting.
This document provides a 3-5 year projection for technology trends in enterprise IT (EIT) based on analysis from experts and current market conditions. Key points include:
- EIT is currently a $2.1 trillion global market dominated by software, devices, and outsourcing.
- Cloud computing and software-as-a-service (SaaS) are rising significantly and most experts predict SaaS will capture the largest share of the business market.
- By 2020, the boundaries between on-premise and cloud deployment may disappear, and technologies like artificial intelligence, autonomous systems, and predictive analytics will be more widely adopted. Data management is also expected to converge across structured and unstructured
The document outlines an agenda for a conference on Apache Spark and data science, including sessions on Spark's capabilities and direction, using DataFrames in PySpark, linear regression, text analysis, classification, clustering, and recommendation engines using Spark MLlib. Breakout sessions are scheduled between many of the technical sessions to allow for hands-on work and discussion.
Modern metadata catalogs use ontologies and thesauri to create hierarchical and polyhierarchical indexes. But catalogs still seem to miss the needs of both consumers and producers of geospatail data alike. A concise introduction to the syntax and semantics of geospatial metadata shows how to get there and that it misses pragmatics (the third semiotic discipline). One path to solve this problem is to automize metadata generation by better linking IT and to allow more interaction of actors.
This document outlines an introductory class meeting for STAT 545A. It introduces the instructor and various tools and concepts related to data science, including RStudio, R Markdown, version control with Git and GitHub, and reproducible research. Students are encouraged to use R Markdown for literate programming and to publish their work to GitHub for collaboration and sharing results.
RDA: Are We There Yet?
This document discusses the progress of Resource Description and Access (RDA) since its publication in 2010. It notes recommendations from libraries that tested RDA, including rewriting instructions in plain English and improving the RDA Toolkit. The implementation date for RDA is March 31, 2013. Differences after implementing RDA include lack of abbreviations, more transcription of elements, new MARC fields, and richer authority records. Fully implementing RDA may involve changes to search options and semantic web/linked data approaches. Tips are provided for libraries on deciding when to implement, talking to vendors, and planning training.
The document outlines an agenda for a two-day programming and AI event. Day 1 covers introduction to Python programming, essential Python for AI, machine learning programming, and machine learning project deployment. Day 2 covers fullstack web development, machine learning integration, AI integration, and a case study. Each day includes registration periods, topic sessions, and coffee breaks.
This document discusses using Perl and Raku for data science. It begins by noting the growth of data science jobs and examines common programming languages used, including Perl, Python, and R. While there were no Raku modules for statistics at the time, basic statistics functions can be written easily in Raku. Examples are provided demonstrating calculating statistics and creating graphs using Perl modules. The future of data science is seen to include areas like data mining, artificial intelligence, and machine learning.
The document discusses RDFa, which allows adding semantic annotations to web documents. RDFa uses additional attributes to embed RDF triples directly in HTML/XHTML pages. This enables computers to extract structured data from the documents. Examples shown include annotating a person's name, address, an image's title and creator, and embedding a Creative Commons license. RDFa triples follow the form of a subject, property and object, reusing terms from vocabularies like FOAF and Dublin Core.
BBC Programmes and Music on the Linking Open Data CloudPatrick Sinclair
The document discusses the BBC's efforts to publish its programmes and music content on the Linked Open Data cloud. It describes how the BBC has created web pages and RDF representations for individual programmes, artists, albums, and other entities. These efforts allow the BBC content to be interconnected with other datasets on the web. Technical details are provided about the BBC's use of ontologies, URIs, and a model-view-controller framework to support these Linked Data initiatives. Future work is discussed, such as improving music recommendations and publishing SPARQL endpoints.
LWC Datatable LDV, Christian Knapp & Christian MenzingerCzechDreamin
This document summarizes a presentation about using Lightning Web Components (LWC) to build a datatable to search for products from a large dataset. It discusses challenges faced, such as platform restrictions, performance issues, and type coercion errors when trying to query fields. It provides tips for handling large data volumes, like using offsets, limits, and Apex for sorting. While promising, LWC datables have limitations and require special handling of features like selections and relationship fields. Open source tools can help generate test data to explore solutions before projects launch.
This document discusses RDSTK, an R package wrapper for the Data Science Toolkit (DSTK) API. It provides functions to interface with DSTK in R, such as street2coordinates() and text2people(). The document also discusses building R packages and includes examples of using DSTK functions. It concludes by acknowledging contributors to the R community and tools that helped develop RDSTK.
The document discusses different types of databases and when each may be suitable. It begins with an introduction to the author and their background. It then discusses key differences between SQL and NoSQL databases, such as how SQL databases are better for large amounts of structured data while NoSQL databases can be useful for unstructured or distributed data. The document emphasizes that the type of database chosen depends on factors like the size and shape of the data as well as how the data will be used. Specific databases are described like MongoDB, Elasticsearch, Neo4j and others.
Slides from my lightning talk at the Boston Predictive Analytics Meetup hosted at Predictive Analytics World, Boston, October 1, 2012.
Full code and data are available on github: http://bit.ly/pawdata
This document discusses text research and the Text-Fabric data model. It describes Text-Fabric as a data model for annotated text corpora, a query engine, a text weaver, and an API. The data model transforms TEI-XML into separate feature files to untangle annotations and enable better data logistics. Computational research involves gathering data from repositories, modeling and analyzing it, publishing results back to repositories, and discussing conclusions in notebooks. Publishing work flows include building websites to deliver research outputs to the general public more accessibly.
Towards TextPy, a module for processing text.
If we define annotated text as a graph with additional structure, we can make text processing more efficient, in the same way that Pandas makes processing dataframes more efficient.
Convolutional Neural Networks and Natural Language ProcessingThomas Delteil
Presentation on Convolutional Neural Networks and their application to Natural Language Processing. In-depth walk-through the Crepe architecture from Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
Loosely based on ODSC London 2016 talk: https://www.slideshare.net/MiguelFierro1/deep-learning-for-nlp-67182819
Code: https://github.com/ThomasDelteil/TextClassificationCNNs_MXNet
Demo: https://thomasdelteil.github.io/TextClassificationCNNs_MXNet/
(flattened pdf, no animation, email author for .pptx)
This document discusses expectations when visualizing data and creating visualizations. It covers 6 main points:
1. Expect to find the real need by understanding audience goals, questions, and intended use of the visualization. Compromise may be needed.
2. Expect to spend significant time (70-80%) cleaning data due to issues like multiple data sources and formats, missing values, and errors.
3. Expect trials and errors in the prototyping process to solve problems and meet deadlines. Iteration is important.
4. For larger datasets, expect challenges in processing, analyzing, and reducing size to find relevant insights. Tools like Hadoop can help handle bigger data.
5.
This document discusses using topic modeling on security reports to gain insights. It describes the process used:
1) Building a corpus from an APT notes repository
2) Cleaning and normalizing the data, including removing stop words and stemming
3) Using latent semantic analysis to analyze the documents and find underlying concepts by decomposing the document-term matrix
4) Calculating cosine similarity between document vectors to recommend similar documents for a given query
5) Graphing the results for visualization
Topic modeling is used to automatically analyze large collections of text and summarize key concepts without manual sorting.
This document provides a 3-5 year projection for technology trends in enterprise IT (EIT) based on analysis from experts and current market conditions. Key points include:
- EIT is currently a $2.1 trillion global market dominated by software, devices, and outsourcing.
- Cloud computing and software-as-a-service (SaaS) are rising significantly and most experts predict SaaS will capture the largest share of the business market.
- By 2020, the boundaries between on-premise and cloud deployment may disappear, and technologies like artificial intelligence, autonomous systems, and predictive analytics will be more widely adopted. Data management is also expected to converge across structured and unstructured
The document outlines an agenda for a conference on Apache Spark and data science, including sessions on Spark's capabilities and direction, using DataFrames in PySpark, linear regression, text analysis, classification, clustering, and recommendation engines using Spark MLlib. Breakout sessions are scheduled between many of the technical sessions to allow for hands-on work and discussion.
Modern metadata catalogs use ontologies and thesauri to create hierarchical and polyhierarchical indexes. But catalogs still seem to miss the needs of both consumers and producers of geospatail data alike. A concise introduction to the syntax and semantics of geospatial metadata shows how to get there and that it misses pragmatics (the third semiotic discipline). One path to solve this problem is to automize metadata generation by better linking IT and to allow more interaction of actors.
This document outlines an introductory class meeting for STAT 545A. It introduces the instructor and various tools and concepts related to data science, including RStudio, R Markdown, version control with Git and GitHub, and reproducible research. Students are encouraged to use R Markdown for literate programming and to publish their work to GitHub for collaboration and sharing results.
RDA: Are We There Yet?
This document discusses the progress of Resource Description and Access (RDA) since its publication in 2010. It notes recommendations from libraries that tested RDA, including rewriting instructions in plain English and improving the RDA Toolkit. The implementation date for RDA is March 31, 2013. Differences after implementing RDA include lack of abbreviations, more transcription of elements, new MARC fields, and richer authority records. Fully implementing RDA may involve changes to search options and semantic web/linked data approaches. Tips are provided for libraries on deciding when to implement, talking to vendors, and planning training.
The document outlines an agenda for a two-day programming and AI event. Day 1 covers introduction to Python programming, essential Python for AI, machine learning programming, and machine learning project deployment. Day 2 covers fullstack web development, machine learning integration, AI integration, and a case study. Each day includes registration periods, topic sessions, and coffee breaks.
This document discusses using Perl and Raku for data science. It begins by noting the growth of data science jobs and examines common programming languages used, including Perl, Python, and R. While there were no Raku modules for statistics at the time, basic statistics functions can be written easily in Raku. Examples are provided demonstrating calculating statistics and creating graphs using Perl modules. The future of data science is seen to include areas like data mining, artificial intelligence, and machine learning.
The document discusses RDFa, which allows adding semantic annotations to web documents. RDFa uses additional attributes to embed RDF triples directly in HTML/XHTML pages. This enables computers to extract structured data from the documents. Examples shown include annotating a person's name, address, an image's title and creator, and embedding a Creative Commons license. RDFa triples follow the form of a subject, property and object, reusing terms from vocabularies like FOAF and Dublin Core.
BBC Programmes and Music on the Linking Open Data CloudPatrick Sinclair
The document discusses the BBC's efforts to publish its programmes and music content on the Linked Open Data cloud. It describes how the BBC has created web pages and RDF representations for individual programmes, artists, albums, and other entities. These efforts allow the BBC content to be interconnected with other datasets on the web. Technical details are provided about the BBC's use of ontologies, URIs, and a model-view-controller framework to support these Linked Data initiatives. Future work is discussed, such as improving music recommendations and publishing SPARQL endpoints.
LWC Datatable LDV, Christian Knapp & Christian MenzingerCzechDreamin
This document summarizes a presentation about using Lightning Web Components (LWC) to build a datatable to search for products from a large dataset. It discusses challenges faced, such as platform restrictions, performance issues, and type coercion errors when trying to query fields. It provides tips for handling large data volumes, like using offsets, limits, and Apex for sorting. While promising, LWC datables have limitations and require special handling of features like selections and relationship fields. Open source tools can help generate test data to explore solutions before projects launch.
This document discusses RDSTK, an R package wrapper for the Data Science Toolkit (DSTK) API. It provides functions to interface with DSTK in R, such as street2coordinates() and text2people(). The document also discusses building R packages and includes examples of using DSTK functions. It concludes by acknowledging contributors to the R community and tools that helped develop RDSTK.
The document discusses different types of databases and when each may be suitable. It begins with an introduction to the author and their background. It then discusses key differences between SQL and NoSQL databases, such as how SQL databases are better for large amounts of structured data while NoSQL databases can be useful for unstructured or distributed data. The document emphasizes that the type of database chosen depends on factors like the size and shape of the data as well as how the data will be used. Specific databases are described like MongoDB, Elasticsearch, Neo4j and others.
Slides from my lightning talk at the Boston Predictive Analytics Meetup hosted at Predictive Analytics World, Boston, October 1, 2012.
Full code and data are available on github: http://bit.ly/pawdata
This document discusses text research and the Text-Fabric data model. It describes Text-Fabric as a data model for annotated text corpora, a query engine, a text weaver, and an API. The data model transforms TEI-XML into separate feature files to untangle annotations and enable better data logistics. Computational research involves gathering data from repositories, modeling and analyzing it, publishing results back to repositories, and discussing conclusions in notebooks. Publishing work flows include building websites to deliver research outputs to the general public more accessibly.
Towards TextPy, a module for processing text.
If we define annotated text as a graph with additional structure, we can make text processing more efficient, in the same way that Pandas makes processing dataframes more efficient.
We demonstrate how Text-Fabric can handle the display of text and annotations, even when chunks of text are not properly embedded in each other. This demo contains examples from the Hebrew Bible and the Old Babylonian Letters (cuneiform clay tablets).
This document summarizes the history and current state of BHSA (Basic Handwritten Script Analysis) tools. It describes early tools like EMDROS and SHEBANQ, as well as more recent projects like Text-Fabric that encode texts in a graph structure with minimal encoding. Text-Fabric files separate each feature of the data into individual files for easy processing and combination. The document outlines Text-Fabric data, sharing, starting with the tool, publishing with it, and available apps and corpora. It promotes Text-Fabric's concepts of transparent, contributor-friendly encodings and provides links to relevant GitHub repositories and tutorials.
Developing a tool for handling text with linguistic annotations. Text-Fabric is meant to support researchers that wnat to contribute portions of the data, and weaves the contributions in into a meaningful whole. Currently, it is primarily meant for working with the Hebrew Bible, based on the ETCBC (Amsterdam) linguistic database.
Conference presentation for 2016 annual meeting of the Society of Biblical Literature, San Antonio. (https://www.sbl-site.org).
Authors: Janet Dyk (linguistic ideas) and Dirk Roorda (computational implementation).
A verb organizes the elements in a sentence. Different patterns of constituents affect the meaning of a verb in a given context. The potential of a verb to combine with patterns of elements is known as its valence. A single set of questions, organized as a flow chart, selects the relevant building blocks within the context of a verb. The resulting pattern provides a particular significance for the verb in question. Because all contexts are submitted to the same flow chart, similarities and differences between verbs come to light. For example, verbs of movement in their causative formation manifest the same patterns as transitive verbs with an object that gets moved. We apply this approach to the whole Hebrew Bible, using the database of the Eep Talstra Centre for Bible and Computer (ETCBC), which contains the relevant linguistic annotations. This allows us to have a complete listing of all patterns for all verbs. It provides the basis for consistent proposals for the significance of specific patterns occurring with a particular verb. The valence results are made available in SHEBANQ, an online research tool based on the ETCBC database. It presents the basic data, text and linguistic features, together with annotations by researchers. The valence results consist of a set of algorithmically generated annotations which show up between the lines of the text. The algorithm itself and its documentation can be found at https://shebanq.ancient-data.org/tools?goto=valence. By using SHEBANQ we achieve several goals with respect to the scholarly workflow: (1) all our results are openly accessible online, and other researchers may comment on them; (2) all resources needed to reproduce this research are available online and can be downloaded (Open Access).
This document provides an overview of the SHEBANQ project, which provides tools for querying annotated Hebrew text data. It describes the data sources and contributors that have built up the underlying text corpus over many years. It also outlines the steps taken to make this data and related tools more accessible, including developing a website, depositing data in archives, running demonstration projects, and integrating the data and tools into broader research environments through additional projects and publications. The goal has been to facilitate wider use of this linguistic resource and foster more digital humanities and data science work based on its contents.
1. The document discusses layers of annotation for analyzing biblical Hebrew text, including the text itself, linguistic features, manually or automatically generated analyses, and queries for exegetical search.
2. It provides an overview of the Linguistic Annotation Framework (LAF) for representing annotated text and statistics on the annotation of one Hebrew text, with over 800,000 regions and 1.4 million nodes.
3. The document describes tools for querying the annotated text, including the SHEBANQ system and LAF-Fabric API, and the ability to work with the data in various formats like XML, binary files, and R.
20151111 utrecht ver theolbibliothecarissenDirk Roorda
DANS is an institute of the Royal Netherlands Academy of Arts and Sciences and the Netherlands Organization for Scientific Research that promotes permanent access to digital research data. It provides data archiving services including depositing datasets in its online repository EASY, which ensures the data is findable, referable, downloadable, usable, and supports scholarly communication through publication of data papers. DANS also works with research organizations using a front office-back office model to facilitate long-term preservation of research data.
Text as Data: processing the Hebrew BibleDirk Roorda
The merits of stand-off markup (LAF) versus inline markup (TEI) for processing text as data. Ideas applied to work with the Hebrew Bible, resulting in tools for researchers and end-users.
Datamanagement for Research: A Case StudyDirk Roorda
How practices of data sharing can help researchers to produce more science.
Session in the data management course organized by RDNL (Research Data in the Netherlands)
Hebrew Bible as Data: Laboratory, Sharing, LessonsDirk Roorda
The document discusses using the Hebrew Bible as a data source for research. It describes several databases and tools for querying and analyzing the data, including ETCBC, SHEBANQ, and LAF-Fabric. It provides an overview of how the data is created, archived, shared and disseminated through the research data cycle. Examples are given of using LAF-Fabric to count nodes, write plain text, and visualize annotations. The goal is to make the Hebrew Bible and linguistic annotations available as linked open data for various types of researchers.
LAF-Fabric: a tool to process the ETCBC Hebrew Text Database in Linguistic Annotation Framework.
How researchers in theology and linguistics can create workflows to analyse the text of the Hebrew Bible and extract data for visualization. Those workflows can be written in Python, and run conveniently in the IPython Notebook.
Joint work with Martijn Naaijer (VU University).
With the Hebrew Bible encoded in Linguistic Annotation Framework (LAF-ISO), and with a new LAF processing tool, we demonstrate how you can do practical data analysis. The tool, LAF-Fabric, integrates with the ipython notebook approach. Our example here is lexeme cooccurrence analysis of bible books. For now, the road from data to visualization is more important than the exact visualization.
The document describes the Linguistic Annotation Framework (LAF), which is an ISO standard for representing stand-off annotation of language resources. LAF allows for annotating text with linguistic information like part-of-speech tags or named entities in an XML format. Example annotated text corpora using LAF include the Open American National Corpus and a text database of the Hebrew Bible. The document then discusses challenges with existing LAF processors and introduces LAF-Fabric as a new tool that compiles LAF annotations into binary data for faster querying of linguistic features and running Python scripts against the data.
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
How to Setup Default Value for a Field in Odoo 17Celine George
In Odoo, we can set a default value for a field during the creation of a record for a model. We have many methods in odoo for setting a default value to the field.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
How to Manage Reception Report in Odoo 17Celine George
A business may deal with both sales and purchases occasionally. They buy things from vendors and then sell them to their customers. Such dealings can be confusing at times. Because multiple clients may inquire about the same product at the same time, after purchasing those products, customers must be assigned to them. Odoo has a tool called Reception Report that can be used to complete this assignment. By enabling this, a reception report comes automatically after confirming a receipt, from which we can assign products to orders.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
1. Data Analysis for Ancient Corpora
applied to the Quran
Dirk Roorda
and
Cornelis van Lit
Filosofie en Religiewetenschap, Utrecht, 2019-03-28
0
50
100
150
200
250
conj nmpr subs adjv prep art
Parts of Speech after Atnach in ETCBC Phrase
2. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
3.
4. • researchers in control of their own
data
• researchers empowered to fully
harness the data available to them
• researchers encouraged to DIY
computing
5. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
6. Data model
• Graph model: words, phrases, etc. are “nodes,” relationships
between them are edges.
• Graphs model complex data structures better than other
methods (e.g. XML).
• All stored in easy-to-understand, plain-text files. No messy
XML, SQL, etc.
• ... and we call him Text-Fabric (TF)
7. Data structure of TF - the IKEA spirit
node
order! order!
stacks of components
uniquely identified
words
phrases
chapters
verses
8. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
9. # Consider Phlebas
$ author=Iain M. Banks
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of
patterns of nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good
ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that
really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
10. @node
@compiler=Dirk Roorda
@description=the letters of a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
Everything
about
us
everything
around
us
everything
we
know
and
can
know
of
is
composed
ultimately
of
patterns
of
nothing
that’s
the
bottom
line
the
final
truth
So letters
@node
@compiler=Dirk Roorda
@description=the punctuation after
a word
@name=Culture quotes from Iain
Banks
@source=Good Reads
@url=https://www.goodreads.com/
work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
3 ,
6 ,
20 ;
24 ,
27 .
38 ,
45 ,
51 ,
55 ?
,
75 ,
78 ,
,
,
83 ,
88 ,
99 .
punc
banks/tf/
author.tf
gap.tf
letters.tf
number.tf
oslots.tf
otext.tf
otype.tf
punc.tf
terminator.tf
title.tf
TF dataset
11. otype
@node
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
12. oslots
@edge
@compiler=Dirk Roorda
@name=Culture quotes from Iain Banks
@source=Good Reads
@url=https://www.goodreads.com/work/quotes/14366-consider-phlebas
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2019-01-30T22:20:19Z
100 1-99
1-55
56-99
1-3
4-6
7-9,14-20
21-27
28-38
39-51
52-55
56
57-75
76-77,81-83
84-88
89-99
1-27
28-55
56-99
1-99 word
100 book
101-102 chapter
103-114 line
115-117 sentence
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of
nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that really
mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
33. A. reasons
B. a solution
C. toy example of a TF datasource
D. ministudy: rings and sentiments
C'. an easter egg
B'. new ways
A'. new horizons
34. Sharing and re-using data
Text-Fabric has been developed by a DANS-employee
as a consequence:
Data export is built in ✅
Provenance tracking is built in ✅
Redistribution of newly created data is built in ✅
35. sharing #1: GitHub & NBviewer
work done in a Jupyter Notebook inside a GitHub repository
is very sharable
39. sharing #4: Create new features
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/quran/share.ipynb
• etcbc/valence/tf : the results of the verbal valence work of Janet Dyk in the
SYNVAR project;
• etcbc/lingo/heads/tf : head words for phrases, work done by Cody Kingham;
• ch-jensen/Semantic-mapping-of-participants/actor/tf : participant analysis in
progress by Christian Høygaard-Jensen;
• cmerwich/bh-reference-system/tf: participant analysis in progress by
Christiaan Erwich;
• nino-cunei/oldbabylonian/parallels/tf: similar lines by Dirk Roorda
• q-ran/quran/parallels/tf: similar lines by Dirk Roorda
• q-ran/exercises/mining/tf: sentiments (crude) by Dirk Roorda
• you/quran/sentiments/tf: sentiments (refined) by You
• cvlit/quran/semantics/tf: semantic fields by cvlit
40. The Text-Fabric Ethos
• Open source tool for corpus annotation and analysis.
• Corpus data in a repository, with standard license, as free as
possible
• Researchers: step out of your technological comfort zones and
pave the way for the ones after you
• Find computational inspiration across disciplines