A survey on parallel corpora alignment andrefsantos
This document provides a survey of methods for aligning parallel text corpora. It discusses the historical background of using parallel texts in language processing from the 1950s onward. Key early methods are described, including ones based on sentence length, lexical mapping between words, and identifying cognates. The document also evaluates major efforts to create benchmark datasets and evaluate system performance against gold standard alignments. It surveys the evolution of various alignment techniques and lists some relevant tools and projects in the field.
The pluralization of 'haber' in Puerto Rican SpanishJeroen Claes
This study investigates the pluralization of impersonal haber in a recent sample (March-April 2011) of 24 native speakers of the Spanish of San Juan, Puerto Rico, focusing upon three research questions: (i) What is the linguistic distribution of the pluralization of presentational haber in the Spanish of San Juan, Puerto Rico? (ii) What is the sociolinguistic distribution of the pluralization of presentational haber in the Spanish of San Juan, Puerto Rico? and, (iii) How can these distributions be explained in a psychologically and sociolinguistically adequate manner? In order to answer these interrogatives, against the background of Cognitive Construction Grammar, we propose the hypothesis that the phe-nomenon corresponds to an advanced ongoing language change from below that consists in the substi-tution of the canonical argument-structure construction, in which the NP functions as a direct object, by an innovative schema – identical in meaning, but different in sociolinguistic and stylistic signifi-cance –, in which the NP functions as a subject. The results that were obtained do not all support this model, but the data from the variables ‘Entrenchment of the verb-form in PRES-1’, ‘Degrees of con-ceptual complexity’, ‘Priming’, ‘Gender’, ‘Age’, and ‘Social prestige’ do argue in favor of it.
Biased learning of long-distance assimilation and dissimilationKevin McMullin
Gunnar Ólafur Hansson and Kevin McMullin. Poster presented at the Workshop on Learning Biases in Natural and Artificial Language Acquisition, LAGB Annual Meeting 2014. September 1-5, 2014 in Oxford, UK.
Characterizing, Modelling and Simulating Naturally Fractured Reservoirs - Stu...Total Campus
This document summarizes challenges in modeling naturally fractured reservoirs (NFRs). NFRs are characterized by a coexistence of a fracture network and rock matrix with different properties. Modeling NFRs is complex due to heterogeneous fracture distributions that impact fluid flow at multiple scales. Key challenges include: (1) determining the important fracture scales that drive flow, (2) characterizing fractures near wells to define the flow network, and (3) extrapolating static and dynamic fracture parameters across fields while representing multiscale flow networks with available data. Overcoming these challenges requires integrating well data, geology, geomechanics, and production data to build a full-field fracturing concept and flow model.
Presentation of a paper by LMC & OKW. Devil in the details:Analysis of a coevolutionary model of language evolution via relaxation of selection. Advances in Artificial Life, ECAL 2011. Proceedings of the Eleventh European Conference on the Synthesis and Simulation of Living Systems.
A survey on parallel corpora alignment andrefsantos
This document provides a survey of methods for aligning parallel text corpora. It discusses the historical background of using parallel texts in language processing from the 1950s onward. Key early methods are described, including ones based on sentence length, lexical mapping between words, and identifying cognates. The document also evaluates major efforts to create benchmark datasets and evaluate system performance against gold standard alignments. It surveys the evolution of various alignment techniques and lists some relevant tools and projects in the field.
The pluralization of 'haber' in Puerto Rican SpanishJeroen Claes
This study investigates the pluralization of impersonal haber in a recent sample (March-April 2011) of 24 native speakers of the Spanish of San Juan, Puerto Rico, focusing upon three research questions: (i) What is the linguistic distribution of the pluralization of presentational haber in the Spanish of San Juan, Puerto Rico? (ii) What is the sociolinguistic distribution of the pluralization of presentational haber in the Spanish of San Juan, Puerto Rico? and, (iii) How can these distributions be explained in a psychologically and sociolinguistically adequate manner? In order to answer these interrogatives, against the background of Cognitive Construction Grammar, we propose the hypothesis that the phe-nomenon corresponds to an advanced ongoing language change from below that consists in the substi-tution of the canonical argument-structure construction, in which the NP functions as a direct object, by an innovative schema – identical in meaning, but different in sociolinguistic and stylistic signifi-cance –, in which the NP functions as a subject. The results that were obtained do not all support this model, but the data from the variables ‘Entrenchment of the verb-form in PRES-1’, ‘Degrees of con-ceptual complexity’, ‘Priming’, ‘Gender’, ‘Age’, and ‘Social prestige’ do argue in favor of it.
Biased learning of long-distance assimilation and dissimilationKevin McMullin
Gunnar Ólafur Hansson and Kevin McMullin. Poster presented at the Workshop on Learning Biases in Natural and Artificial Language Acquisition, LAGB Annual Meeting 2014. September 1-5, 2014 in Oxford, UK.
Characterizing, Modelling and Simulating Naturally Fractured Reservoirs - Stu...Total Campus
This document summarizes challenges in modeling naturally fractured reservoirs (NFRs). NFRs are characterized by a coexistence of a fracture network and rock matrix with different properties. Modeling NFRs is complex due to heterogeneous fracture distributions that impact fluid flow at multiple scales. Key challenges include: (1) determining the important fracture scales that drive flow, (2) characterizing fractures near wells to define the flow network, and (3) extrapolating static and dynamic fracture parameters across fields while representing multiscale flow networks with available data. Overcoming these challenges requires integrating well data, geology, geomechanics, and production data to build a full-field fracturing concept and flow model.
Presentation of a paper by LMC & OKW. Devil in the details:Analysis of a coevolutionary model of language evolution via relaxation of selection. Advances in Artificial Life, ECAL 2011. Proceedings of the Eleventh European Conference on the Synthesis and Simulation of Living Systems.
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Katerina Vylomova
ACL'2016 presentation. Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of lexical relation types and different learning settings has not been evaluated. In this paper, we carry out such an evaluation in two learning settings: (1) spectral clustering to induce word relations, and (2) supervised learning to classify vector differences into relation types. We find that word embeddings capture a surprising amount of information, and that, under suitable supervised training, vector subtraction generalises well to a broad range of relations, including over unseen lexical items.
The sequential stages culminating in the publication of a morphological cladistic analysis of weevils in the Exophthalmus genus complex (Coleoptera: Curculionidae: Entiminae) are reviewed, with an emphasis on how early- stage homology assessments were gradually evaluated and refined in light of intermittent phylogenetic insights. In all, 60 incremental versions of the evolving character matrix were congealed and analysed, starting with an assembly of 52 taxa and ten traditionally deployed diagnostic characters, and ending with 90 taxa and 143 characters that reflect significantly more narrow assessments of phylogenetic similarity and scope. Standard matrix properties and analytical tree statistics were traced throughout the analytical process, and series of incongruence length indifference tests were used to identify critical points of topology change among succeeding matrix versions. This kind of parsimony-contingent rescoping is generally representative of the inferential process of character individuation within individual and across multiple cladistic analyses. The expected long-term outcome is a maturing observational terminology in which precise inferences of homology are parsimony-contingent, and the notions of homology and parsimony are inextricably linked. This contingent view of cladistic character individuation is contrasted with current approaches to developing phenotype ontologies based on homology-neutral structural equivalence expressions. Recommendations are made to transparently embrace the parsimony-contingent nature of cladistic homology.
Michael Farina presented on establishing new upper bounds for the k-distance domination numbers of grid graphs by generalizing an existing construction of dominating sets to k-distance dominating sets. Armando Grez examined a method for constructing fullerene patches with 4 pentagonal faces and produced an exact process for drawing them. Darleen Perez-Lavin partitioned the set of permutations with a peak set into subsets ending with an ascent or descent and provided formulas to enumerate these subsets for Coxeter groups of types B and D.
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic ApproachChristophe Veaux
The document summarizes a study that compares a baseline unit selection speech synthesis system to a proposed multi-level system. The proposed system uses both phone and syllable models with a Generalized Viterbi Algorithm (GVA) search to maximize joint likelihood at both segmental and prosodic levels. A subjective test with 25 listeners found the proposed system improved overall naturalness compared to the baseline system, which only considers concatenation cost.
Experiment I found that consonant length contrasts were more difficult to discriminate when the consonants were spectrally continuous with surrounding segments, replicating previous research. Experiment II then directly compared conditions with different amplitude drops and found that length contrasts were more difficult to discriminate for consonant intervals with a smaller amplitude drop. This suggests that both spectral discontinuity and greater amplitude drop help listeners perceive segmental boundaries and discriminate consonant length. Amplitude drop may account for the effects of both spectral discontinuity and drop magnitude, as spectral discontinuity represents an extreme case of high amplitude change. Future research could examine consonants involving spectral discontinuity but little amplitude change, like fricatives.
The scarcity of crossing dependencies: a direct outcome of a specific constra...Graph-TA
This document summarizes a study on the scarcity of crossing dependencies in syntactic structures across languages. It presents two major hypotheses for why crossings are scarce: 1) an underlying rule or principle prohibits crossings, or 2) crossings are indirectly limited by dependency length minimization, which constrains dependency lengths. The study evaluates these hypotheses by analyzing dependency structures from 30 languages and finding that accounting for dependency lengths reduces errors in predicting crossings compared to random arrangements, supporting the second hypothesis.
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
More info: http://mtg.upf.edu/node/2998
Abstract: Musical melodies contain hierarchically organized events, where some events are more salient than others, acting as melodic landmarks. In Hindustani music melodies, an important landmark is the occurrence of a nyas. Occurrence of nyas is crucial to build and sustain the format of a rag and mark the boundaries of melodic motifs. Detection of nyas segments is relevant to tasks such as melody segmentation, motif discovery and rag recognition. However, detection of nyas segments is challenging as these segments do not follow explicit set of rules in terms of segment length, contour characteristics, and melodic context. In this paper we propose a method for the automatic detection of nyas segments in Hindustani music melodies. It consists of two main steps: a segmentation step that incorporates domain knowledge in order to facilitate the placement of nyas boundaries, and a segment classification step that is based on a series of musically motivated pitch contour features. The proposed method obtains significant accuracies for a heterogeneous data set of 20 audio music recordings containing 1257 nyas svar occurrences and total duration of 1.5 hours. Further, we show that the proposed segmentation strategy significantly improves over a classical piece-wise linear segmentation approach.
Application of FT-IR to Studies of Surfactant BehaviorDavid Scheuing
Talk from the 2011 American Oil Chemist's Society meeting (Surfactants and Detergents Division). Reviews the basics of FT-IR spectroscopy and how it can be used in a wide range of applications to surfactant science.
How can FT-IR deal with aqueous solutions? How can shifts in wavenumber be interpreted? What is a significant shift in wavenumber?
Lecture slides on Estimating Species Divergence Times in RevBayes (https://github.com/revbayes/revbayes).
By Tracy Heath and Tanja Stadler
** this version was taught at the 2014 NESCent Academy Course: Phylogenetic analysis using RevBayes, 8/27/2014 (https://www.nescent.org/sites/academy/Phylogenetic_analysis_using_RevBayes) **
Cambridge 2014 Complexity, tails and trendsNick Watkins
This document discusses two types of complexity that can affect trend detection in time series data: long range dependence and heavy tails.
Long range dependence, if present in a system, implies the presence of low frequency "slow" fluctuations that can complicate trend detection. Heavy tails in a probability distribution are a source of "wild" fluctuations due to more frequent extreme events.
The document reviews several examples of long range dependence and heavy tails observed in real-world datasets like financial data and space weather data. Statistical models like linear fractional stable motion (LFSM) and autoregressive fractionally integrated moving average (ARFIMA) processes are discussed for modeling systems with both properties. Better statistical inference methods are also needed to distinguish true
This study investigated Japanese EFL learners' explicit and implicit knowledge of sentence-level discourse constraints regarding assertive predicates in English. 18 Japanese graduate students completed untimed and speeded grammatical judgment tests of sentences containing assertive and non-assertive predicates. Results showed no differences between timed and untimed conditions, suggesting learners lacked both explicit and implicit knowledge of these constraints. The study concludes such features may be difficult to acquire naturally and require explicit instruction. Further research is needed on sentence-level discourse constraints.
A Two-Speed Language Evolution - Protolang Torun - September 2011Olaf Witkowski
1) The document discusses a two-speed model of language evolution based on r/K selection strategies from biology. r-strategist words spread widely and are useful in unpredictable environments, while K-strategist words are specialized for stable contexts.
2) It proposes the concept of a linguistic carrying capacity, determined by limits of individual memory and the transmission channel. Above this capacity, a K-strategy becomes more efficient than an r-strategy.
3) Agent-based simulations are suggested to model language transmission between generations of learners and observe the emergence of r/K tendencies, helping to validate hypotheses about factors influencing carrying capacity.
This document discusses lexical and semantic selection in constraint-based grammars. It outlines types of selection such as syntactic, lexical and semantic selection used in the LKB/ERG grammar. It then examines the problem of collocation, discussing the distribution of magnitude adjectives like heavy, high, big and large with nouns. Finally, it considers possible accounts of this distribution including differences in denotation, selectional restrictions, lexical selection and a collocational account.
The document discusses formal language theory and its applications in natural language processing (NLP). It covers two main goals in computational linguistics - theoretical interest in formally characterizing natural language and practical interest in using well-understood frameworks like finite state models to solve NLP problems. Finite state devices are widely used in NLP tasks due to their efficiency and ability to model linguistic phenomena like words through dictionaries and rules. While finite state models provide a useful approximation of language, natural languages pose challenges like ambiguity, long distance dependencies and non-regular features that require extensions to basic finite state models.
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...indexPub
Main fundamental challenge for recent research work on speech based on science and technology is to understand and model the user variants in Spoken Languages. Users have their style of speaking, reliant on various factors, adding the dialect and accent of the speaker as well as the social and economic background of the speaker and contextual attributes like degree of knowledge between the listener, speaker and the position or rank of the speaking condition, from very normal to formal. In the past few decades, an extensive progress has been seen in automatically verifying the language of a speaker offered a sample speech. The main purpose of dialect verification is the recognition of a speaker’s region dialect, within a pre-determined language, offered the acoustic signal alone. DR (Dialect Recognition) is a main issue in particular, since even within the similar dialect and accent or register user change may occur. For illustration, In Spontaneous speech, few speakers tend to exhibit more optimizing and alteration of function words than others. The main issue of dialect recognition system has been viewed as challenging than that of language classification or recognition due to the maximum similarity among dialects of the similar language. While, dialects may differ in any dimensions of the linguistic spectrum such as syntactic, lexical, morphological, phonological differences, these changes are likely to be more indirect across dialects than those across languages such as Hindi, Punjabi and English etc.
A stage-structured delayed advection reaction-diffusion model for single spec...IJECEIAES
The document summarizes a stage-structured delayed advection reaction-diffusion model for a single species population. The model derives a delay advection reaction-diffusion equation with a linear advection term from an age-structured population model. It then studies the derived equation under homogeneous Dirichlet boundary conditions and an initial condition to find the minimum domain length L that prevents species extinction under the effect of advection and reaction-diffusion. Finally, it incorporates time delays to measure the time lengths from birth to population development stages.
- Phylogenetic analysis involves constructing phylogenetic trees to represent evolutionary relationships between taxa, and using the trees to study character and rate evolution.
- There are two main components - phylogeny inference to build the tree topology, and character and rate analysis using the trees as a framework.
- Phylogenetic trees can be used to address questions about evolutionary history, such as which species are closest living relatives to humans or tracing the origin of transposable elements. Precise tree construction and rooting is important for drawing accurate conclusions.
This document proposes three methods for generating reliable and valid distractors for fill-in-the-blank language learning quizzes: 1) A confusion matrix method using an ESL corpus, 2) A discriminative ESL method using classifiers trained on an ESL corpus, and 3) A discriminative simulated-ESL method using classifiers trained on pseudo-ESL data. An experiment compares the three proposed methods to existing thesaurus- and roundtrip translation-based methods. The discriminative simulated-ESL method performed best in terms of distractor appropriateness and ability to discriminate learner proficiency levels.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
More Related Content
Similar to Inductive learning of long-distance dissimilation as a problem for phonology
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Katerina Vylomova
ACL'2016 presentation. Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision. Prior work has evaluated this intriguing result using a word analogy prediction formulation and hand-selected relations, but the generality of the finding over a broader range of lexical relation types and different learning settings has not been evaluated. In this paper, we carry out such an evaluation in two learning settings: (1) spectral clustering to induce word relations, and (2) supervised learning to classify vector differences into relation types. We find that word embeddings capture a surprising amount of information, and that, under suitable supervised training, vector subtraction generalises well to a broad range of relations, including over unseen lexical items.
The sequential stages culminating in the publication of a morphological cladistic analysis of weevils in the Exophthalmus genus complex (Coleoptera: Curculionidae: Entiminae) are reviewed, with an emphasis on how early- stage homology assessments were gradually evaluated and refined in light of intermittent phylogenetic insights. In all, 60 incremental versions of the evolving character matrix were congealed and analysed, starting with an assembly of 52 taxa and ten traditionally deployed diagnostic characters, and ending with 90 taxa and 143 characters that reflect significantly more narrow assessments of phylogenetic similarity and scope. Standard matrix properties and analytical tree statistics were traced throughout the analytical process, and series of incongruence length indifference tests were used to identify critical points of topology change among succeeding matrix versions. This kind of parsimony-contingent rescoping is generally representative of the inferential process of character individuation within individual and across multiple cladistic analyses. The expected long-term outcome is a maturing observational terminology in which precise inferences of homology are parsimony-contingent, and the notions of homology and parsimony are inextricably linked. This contingent view of cladistic character individuation is contrasted with current approaches to developing phenotype ontologies based on homology-neutral structural equivalence expressions. Recommendations are made to transparently embrace the parsimony-contingent nature of cladistic homology.
Michael Farina presented on establishing new upper bounds for the k-distance domination numbers of grid graphs by generalizing an existing construction of dominating sets to k-distance dominating sets. Armando Grez examined a method for constructing fullerene patches with 4 pentagonal faces and produced an exact process for drawing them. Darleen Perez-Lavin partitioned the set of permutations with a peak set into subsets ending with an ascent or descent and provided formulas to enumerate these subsets for Coxeter groups of types B and D.
Prosodic Control of Unit-Selection Speech Synthesis: A Probabilistic ApproachChristophe Veaux
The document summarizes a study that compares a baseline unit selection speech synthesis system to a proposed multi-level system. The proposed system uses both phone and syllable models with a Generalized Viterbi Algorithm (GVA) search to maximize joint likelihood at both segmental and prosodic levels. A subjective test with 25 listeners found the proposed system improved overall naturalness compared to the baseline system, which only considers concatenation cost.
Experiment I found that consonant length contrasts were more difficult to discriminate when the consonants were spectrally continuous with surrounding segments, replicating previous research. Experiment II then directly compared conditions with different amplitude drops and found that length contrasts were more difficult to discriminate for consonant intervals with a smaller amplitude drop. This suggests that both spectral discontinuity and greater amplitude drop help listeners perceive segmental boundaries and discriminate consonant length. Amplitude drop may account for the effects of both spectral discontinuity and drop magnitude, as spectral discontinuity represents an extreme case of high amplitude change. Future research could examine consonants involving spectral discontinuity but little amplitude change, like fricatives.
The scarcity of crossing dependencies: a direct outcome of a specific constra...Graph-TA
This document summarizes a study on the scarcity of crossing dependencies in syntactic structures across languages. It presents two major hypotheses for why crossings are scarce: 1) an underlying rule or principle prohibits crossings, or 2) crossings are indirectly limited by dependency length minimization, which constrains dependency lengths. The study evaluates these hypotheses by analyzing dependency structures from 30 languages and finding that accounting for dependency lengths reduces errors in predicting crossings compared to random arrangements, supporting the second hypothesis.
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
More info: http://mtg.upf.edu/node/2998
Abstract: Musical melodies contain hierarchically organized events, where some events are more salient than others, acting as melodic landmarks. In Hindustani music melodies, an important landmark is the occurrence of a nyas. Occurrence of nyas is crucial to build and sustain the format of a rag and mark the boundaries of melodic motifs. Detection of nyas segments is relevant to tasks such as melody segmentation, motif discovery and rag recognition. However, detection of nyas segments is challenging as these segments do not follow explicit set of rules in terms of segment length, contour characteristics, and melodic context. In this paper we propose a method for the automatic detection of nyas segments in Hindustani music melodies. It consists of two main steps: a segmentation step that incorporates domain knowledge in order to facilitate the placement of nyas boundaries, and a segment classification step that is based on a series of musically motivated pitch contour features. The proposed method obtains significant accuracies for a heterogeneous data set of 20 audio music recordings containing 1257 nyas svar occurrences and total duration of 1.5 hours. Further, we show that the proposed segmentation strategy significantly improves over a classical piece-wise linear segmentation approach.
Application of FT-IR to Studies of Surfactant BehaviorDavid Scheuing
Talk from the 2011 American Oil Chemist's Society meeting (Surfactants and Detergents Division). Reviews the basics of FT-IR spectroscopy and how it can be used in a wide range of applications to surfactant science.
How can FT-IR deal with aqueous solutions? How can shifts in wavenumber be interpreted? What is a significant shift in wavenumber?
Lecture slides on Estimating Species Divergence Times in RevBayes (https://github.com/revbayes/revbayes).
By Tracy Heath and Tanja Stadler
** this version was taught at the 2014 NESCent Academy Course: Phylogenetic analysis using RevBayes, 8/27/2014 (https://www.nescent.org/sites/academy/Phylogenetic_analysis_using_RevBayes) **
Cambridge 2014 Complexity, tails and trendsNick Watkins
This document discusses two types of complexity that can affect trend detection in time series data: long range dependence and heavy tails.
Long range dependence, if present in a system, implies the presence of low frequency "slow" fluctuations that can complicate trend detection. Heavy tails in a probability distribution are a source of "wild" fluctuations due to more frequent extreme events.
The document reviews several examples of long range dependence and heavy tails observed in real-world datasets like financial data and space weather data. Statistical models like linear fractional stable motion (LFSM) and autoregressive fractionally integrated moving average (ARFIMA) processes are discussed for modeling systems with both properties. Better statistical inference methods are also needed to distinguish true
This study investigated Japanese EFL learners' explicit and implicit knowledge of sentence-level discourse constraints regarding assertive predicates in English. 18 Japanese graduate students completed untimed and speeded grammatical judgment tests of sentences containing assertive and non-assertive predicates. Results showed no differences between timed and untimed conditions, suggesting learners lacked both explicit and implicit knowledge of these constraints. The study concludes such features may be difficult to acquire naturally and require explicit instruction. Further research is needed on sentence-level discourse constraints.
A Two-Speed Language Evolution - Protolang Torun - September 2011Olaf Witkowski
1) The document discusses a two-speed model of language evolution based on r/K selection strategies from biology. r-strategist words spread widely and are useful in unpredictable environments, while K-strategist words are specialized for stable contexts.
2) It proposes the concept of a linguistic carrying capacity, determined by limits of individual memory and the transmission channel. Above this capacity, a K-strategy becomes more efficient than an r-strategy.
3) Agent-based simulations are suggested to model language transmission between generations of learners and observe the emergence of r/K tendencies, helping to validate hypotheses about factors influencing carrying capacity.
This document discusses lexical and semantic selection in constraint-based grammars. It outlines types of selection such as syntactic, lexical and semantic selection used in the LKB/ERG grammar. It then examines the problem of collocation, discussing the distribution of magnitude adjectives like heavy, high, big and large with nouns. Finally, it considers possible accounts of this distribution including differences in denotation, selectional restrictions, lexical selection and a collocational account.
The document discusses formal language theory and its applications in natural language processing (NLP). It covers two main goals in computational linguistics - theoretical interest in formally characterizing natural language and practical interest in using well-understood frameworks like finite state models to solve NLP problems. Finite state devices are widely used in NLP tasks due to their efficiency and ability to model linguistic phenomena like words through dictionaries and rules. While finite state models provide a useful approximation of language, natural languages pose challenges like ambiguity, long distance dependencies and non-regular features that require extensions to basic finite state models.
DIALECTAL VARIABILITY IN SPOKEN LANGUAGE: A COMPREHENSIVE SURVEY OF MODERN TE...indexPub
Main fundamental challenge for recent research work on speech based on science and technology is to understand and model the user variants in Spoken Languages. Users have their style of speaking, reliant on various factors, adding the dialect and accent of the speaker as well as the social and economic background of the speaker and contextual attributes like degree of knowledge between the listener, speaker and the position or rank of the speaking condition, from very normal to formal. In the past few decades, an extensive progress has been seen in automatically verifying the language of a speaker offered a sample speech. The main purpose of dialect verification is the recognition of a speaker’s region dialect, within a pre-determined language, offered the acoustic signal alone. DR (Dialect Recognition) is a main issue in particular, since even within the similar dialect and accent or register user change may occur. For illustration, In Spontaneous speech, few speakers tend to exhibit more optimizing and alteration of function words than others. The main issue of dialect recognition system has been viewed as challenging than that of language classification or recognition due to the maximum similarity among dialects of the similar language. While, dialects may differ in any dimensions of the linguistic spectrum such as syntactic, lexical, morphological, phonological differences, these changes are likely to be more indirect across dialects than those across languages such as Hindi, Punjabi and English etc.
A stage-structured delayed advection reaction-diffusion model for single spec...IJECEIAES
The document summarizes a stage-structured delayed advection reaction-diffusion model for a single species population. The model derives a delay advection reaction-diffusion equation with a linear advection term from an age-structured population model. It then studies the derived equation under homogeneous Dirichlet boundary conditions and an initial condition to find the minimum domain length L that prevents species extinction under the effect of advection and reaction-diffusion. Finally, it incorporates time delays to measure the time lengths from birth to population development stages.
- Phylogenetic analysis involves constructing phylogenetic trees to represent evolutionary relationships between taxa, and using the trees to study character and rate evolution.
- There are two main components - phylogeny inference to build the tree topology, and character and rate analysis using the trees as a framework.
- Phylogenetic trees can be used to address questions about evolutionary history, such as which species are closest living relatives to humans or tracing the origin of transposable elements. Precise tree construction and rooting is important for drawing accurate conclusions.
This document proposes three methods for generating reliable and valid distractors for fill-in-the-blank language learning quizzes: 1) A confusion matrix method using an ESL corpus, 2) A discriminative ESL method using classifiers trained on an ESL corpus, and 3) A discriminative simulated-ESL method using classifiers trained on pseudo-ESL data. An experiment compares the three proposed methods to existing thesaurus- and roundtrip translation-based methods. The discriminative simulated-ESL method performed best in terms of distractor appropriateness and ability to discriminate learner proficiency levels.
Similar to Inductive learning of long-distance dissimilation as a problem for phonology (20)
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Inductive learning of long-distance dissimilation as a problem for phonology
1. Induc<ve
learning
of
long-‐distance
dissimila<on
as
a
problem
for
phonology
1.
Background
Kevin
McMullin
and
Gunnar
Ólafur
Hansson
University
of
Bri-sh
Columbia
Consonant
harmony
Ar-ficial
language
learning
(harmony)
• Two consonants must agree for some feature value
• Two attested variants of locality (Rose & Walker 2004, Hansson 2010)
1. Unbounded harmony holds at any distance within the relevant domain
2. Transvocalic harmony applies across at most one vowel
Illustration of the typological split in two Omotic languages
Unbounded sibilant harmony in Aari (Hayward 1990)
a. /baʔ-s-e/ baʔse ‘he brought’
b. /tʃʼa̤ːq-s-it/ tʃʼa̤ːqʃit ‘I swore’
c. /ʃed-er-s-it/ ʃederʃit ‘I was seen’
Transvocalic sibilant harmony in Koyra (Koorete; Hayward 1982)
a. /tim-d-osːo/ tindosːo ‘he got wet’
b. /patʃ-d-osːo/ patʃːoʃːo ‘it became less’
c. /ʃod-d-osːo/ ʃodːosːo ‘he uprooted’
• The attested split is mirrored in the results of adult phonotactic learning for
sibilants (Finley 2011, 2012) and liquids (McMullin & Hansson in press)
…Cv-Cv …Cvcv-Cv Cvcvcv-Cv
Unbounded + + +
Transvocalic + – –
unattested – + –
unattested – + +
unattested + + –
Ques-ons
• Do humans learn and generalize long-distance consonant dissimilation in the
same way as harmony?
• How do these biases relate to learnability and formal complexity?
4.
Discussion
(Dis)Agreement
by
(Non)Correspondence
Formal-‐computa-onal
perspec-ve
• CORR constraints induce a surface correspondence relation (C↔C) on co-occurring
segments that are sufficiently similar
• “CC-Limiter” constraints impose conditions on corresponding segments
(e.g. agreement in some additional [F], structural relations)
CORR-[Rhotic] (Bennett 2013)
If two co-occurring consonants are both [Rhotic], they must stand
in C↔C correspondence (indicated by subscript indices).
CC-EDGE(morpheme) (Bennett 2013)
Segments in C↔C correspondence must be tautomorphemic.
CC-SYLLADJ (Bennett 2013; cf. PROXIMITY in Rose & Walker 2004)
Segments in C↔C correspondence must be in the same or adjacent
syllables (slightly simplified definition).
• Inability to enforce CC-Limiter demands may trigger dissimilation as a
repair (avoiding the need for C↔C correspondence)
• Languages can be considered stringsets whose phonotactics can be modeled
with a formal grammar that identifies (un)grammatical strings (words)
• Complexity of a phonotactic pattern can be assessed based on its membership
in certain well-defined classes of formal languages (e.g. subregular languages)
Strictly Local languages (SL)
• Not computationally complex, defined in terms of k-factors (n-grams)
• Learnable in the limit from positive data for any fixed k (Heinz 2010)
• Bounded co-occurrence restrictions are Strictly Local
e.g. Transvocalic liquid dissimilation is SL3: *rVr, but rV…Vr is permitted
• Unbounded co-occurrence restrictions are not SLk (they hold at length k+1)
Tier-based Strictly Local languages (TSL; Heinz et al. 2011)
• Properly include the SL languages
• Defined in terms of k-factors amongst a subset of the inventory (tiers)
• Tiers can be defined in terms of features, natural classes, or arbitrarily
Examples of tier-based substrings for a word pilemoru
Future
studies
• How do learners deal with overt evidence of an unattested locality type (e.g.
beyond-transvocalic-only dissimilation/harmony)?
• Can learners discover (or infer) phonotactic patterns of dissimilation/harmony
with blocking by intervening segments of certain kinds?
• What is the appropriate characterization of the “transvocalic” relation?
Syllable-adjacency? Consonant-tier adjacency? Onset-tier adjacency?
• Are there restrictions on the set of possible tiers, or on the relationship between
a tier T and the set of targeted 2-factors (bigrams) on that tier?
Possible
theory-‐internal
solu-ons
• Add special versions of CORR constraints that are limited to a CVC
window (Hansson 2010, Bennett 2013) – resolves ranking paradox for
transvocalic-only dissimilation
• Abandon CC-SYLLADJ from CC-Limiter constraint class – removes
beyond-transvocalic-only dissimilation from the factorial typology
2.
Methodology
Experimental
design:
Three
phases
Example
s-muli
1. Practice: Initial exposure to six CVCV-LV stem-suffix pairs in two tenses
2. Training:192 triplets with suffix-triggered liquid dissimilation
• Each of three groups differed only in the stems encountered in training
Control: No liquids – intended to reveal any underlying biases
Nontransvocalic: 96 CVCVCV stems, 96 CVLVCV
Transvocalic: 96 CVCVCV stems, 96 CVCVLV
3. Testing: Subjects heard a stem followed by two options with the same suffix
• Choice between liquid harmony vs. disharmony (2AFC task)
• 32 trials for stems at each of three trigger-target distances (96 total trials)
• Short- (CVCVLV), Medium- (CVLVCV), and Long-range (LVCVCV)
➤
➤
➤
➤
➤
“Past tense” – toke…toke-li; “Future tense” – mebi…mebi-ru
Stimuli were presented over a set of headphones and repeated aloud
tikemu…tikemu-li…tikemu-ru; bipobe…bipobe-ru…bipobe-li
giluko…giruko-li…giluko-ru; norego…nolego-ru…norego-li
pokuri…pokuri-li…pokuli-ru; depile…depile-ru…depire-li
dotile…dotile-li or dotire-li; tukiri…tukiri-ru or tukili-ru (Short-range)
teriti…teliti-ru or teriti-ru; bilegi…bilegi-ru or biregi-ru (Medium-range)
linode…linode-li or rinode-li; renitu…lenitu-li or renitu-li (Long-range)
3.
Results
and
analysis
Mixed-‐effects
logis-c
regression
• Dependent variable: Was disharmony chosen on a particular trial?
• Random by-subject intercepts and slopes for disharmony second/faithful
References Acknowledgements
Bennett, William. 2013. Dissimilation, consonant harmony, and surface correspondence. Doctoral
dissertation, Rutgers University.
Finley, Sara. 2011. The privileged status of locality in consonant harmony. Journal of Memory and
Language 65:74–83.
Finley, Sara. 2012. Testing the limits of long-distance learning: learning beyond a three- segment
window. Cognitive Science 36:740–756.
Hansson, Gunnar Ólafur. 2010. Consonant harmony: long-distance interaction in phonology.
Berkeley: University of California Press.
Hayward, Richard J. 1982. Notes on the Koyra language. Afrika und Übersee 65:211–268.
Hayward, Richard J. 1990. Notes on the Aari language. In Omotic language studies, ed. R. J.
Hayward, 425–493. London: School of Oriental and African Studies.
Heinz, Jeffrey. 2010. Learning long-distance phonotactics. Linguistic Inquiry 41(4): 623–661.
Heinz, Jeffrey, Chetan Rawal and Herbert G. Tanner. 2011. Tier-based strictly local constraints for
phonology. Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics, pp. 58–64. Association for Computational Linguistics.
McMullin, Kevin and Gunnar Ólafur Hansson. In press. Locality in long-distance phonotactics:
evidence for modular learning. To appear in Proceedings of NELS 44, ed. Jyoti Iyer and Leland
Kusmer. GLSA Publications, University of Massachusetts.
McNaughton, Robert, and Seymour Papert. 1971. Counter-free automata. Cambridge, MA: MIT
Press.
Rose, Sharon, and Rachel Walker. 2004. A typology of consonant agreement as correspondence.
Language 80:475–531.
This research was supported by SSHRC Insight Grant 435–2013–0455 to Gunnar Ólafur
Hansson and a UBC Faculty of Arts Graduate Research Award to Kevin McMullin. Special
thanks to Carla Hudson Kam and the UBC Language and Learning Lab, as well as to Jeff
Heinz, Alexis Black, James Crippen, Ella Fund-Reznicek and Michael McAuliffe
LabPhon
14,
NINJAL,
Tokyo,
Japan,
July
25-‐27,
2014
Unbounded (attested)
/CVrV-rV/ CC-SYLLADJ CC-EDGE CORR-[Rhotic] IDENT[lat]-IO
! a. CV.lxV-ryV *
b. CV.rxV-rxV * W L
c. CV.rxV-ryV * W L
/rVCV-rV/ CC-SYLLADJ CC-EDGE CORR-[Rhotic] IDENT[lat]-IO
! a. lxV.CV-ryV *
b. rxV.CV-rxV * W * W L
c. rxV.CV-ryV * W L
Transvocalic (attested): RANKING PARADOX
/CVrV-rV/ CORR-[Rhotic] CC-EDGE IDENT[lat]-IO CC-SYLLADJ
! a. CV.lxV-ryV *
b. CV.rxV-rxV * W L
c. CV.rxV-ryV *! W L
/rVCV-rV/ CORR-[Rhotic] CC-EDGE IDENT[lat]-IO CC-SYLLADJ
a. lxV.CV-ryV L * W L
" b. rxV.CV-rxV *! *
c. rxV.CV-ryV *! W L L
Beyond-transvocalic-only (unattested?)
/CVrV-rV/ CC-SYLLADJ CORR-[Rhotic] IDENT[lat]-IO CC-EDGE
a. CV.lxV-ryV *! W L
! b. CV.rxV-rxV *
c. CV.rxV-ryV *! W L
/rVCV-rV/ CC-SYLLADJ CORR-[Rhotic] IDENT[lat]-IO CC-EDGE
! a. lxV.CV-ryV *
b. rxV.CV-rxV *! W L * W
c. rxV.CV-ryV *! W L
Type of test item (trigger-target distance)
Short-range Medium-range Long-range
Nontransvocalic
vs. Control
4.11
p < 0.001
3.19
p < 0.001
1.49
p ≈ 0.236
Transvocalic
vs. Control
8.75
p < 0.001
1.39
p ≈ 0.292
0.83
p ≈ 0.539
Table of Odds Ratios comparing disharmony choices between experimental and
control groups after releveling the mixed logit model at each testing distance.
Coefficient Estimate SE Pr(>|z|)
Intercept –0.7090 0.2704 0.009
Disharmony second –0.6089 0.1205 <0.001
Disharmony faithful 2.2224 0.3318 <0.001
Medium-range –0.0459 0.1837 0.803
Long-range 0.1887 0.1827 0.302
Nontransvocalic 1.4132 0.3414 <0.001
Nontransvocalic × Medium-range –0.2508 0.2656 0.345
Nontransvocalic × Long-range –1.0195 0.2631 <0.001
Transvocalic 2.1695 0.3309 <0.001
Transvocalic × Medium-range –1.8385 0.2742 <0.001
Transvocalic × Long-range –2.3643 0.2753 <0.001
Summary of the fixed effects portion of the logit mixed model (N = 3404;
log-likelihood = –1666.9; baseline level of unfaithful disharmony being
chosen by the Control group in the first item of a Short-range trial)
Regular
languages
Locally
Testable
Tier-‐based
Strictly
Local
Strictly
Piecewise
Star-‐Free
Locally
Threshold
Testable
Strictly
Local
Piecewise
Testable
Figure illustrating the subregular hierarchy (McNaughton & Papert 1971,
Heinz et al. 2011; see also Heinz 2010, Rogers & Pullum 2011).
vowels T = {i, e,o,u} pilemoru
consonants T = {p, l,m, r} pilemoru
liquids T = {l, r} pilemoru
arbitrary T = {o, l,m, p} pilemoru
Short-range
(cvcvLv-Lv)
Medium-range
(cvLvcv-Lv)
Long-range
(Lvcvcv-Lv)
Locality levels (test-item types)
Proportion disharmony responses ([r…l] or [l…r])
0.00 0.25 0.50 0.75 1.00
Nontransvocalic group Control group Transvocalic group Locality ABC? TSL2? Formal properties
Unbounded ✔ ✔
TSL2 for T = {l, r} (all liquids)
Bigram restrictions: {*ll,*rr}
Transvocalic ✗ ✔
TSL2 for T = {C, l, r} (all consonants)
Bigram restrictions: {*ll,*rr}
Beyond-transvocalic-only
✔ ✗
Not TSLk for any value of T or k
If T = {C, l, r} then for any banned k-factor r Cn r (with
k = n+2), the longer r Cn+1 r must also be banned.
If T = {l, r}, then k relates to the number of liquids in
the word, not their distance from each other.