This document discusses probabilistic context-free grammars (PCFGs) for parsing natural language. It notes some limitations of vanilla PCFGs, including their inability to resolve ambiguities requiring semantic context. It emphasizes the importance of lexicalization, where productions are conditioned on part-of-speech tags and head words. The document also discusses techniques like non-terminal splitting to better model contextual dependencies, and discriminative parse reranking to select the best parse from an N-best list using global features.
The document discusses the CLEAR dependency format, which adapts the Stanford dependency format. The CLEAR format remaps dependencies for empty categories, preserves function tags, and preserves secondary dependencies. It then provides details on CLEAR dependency labels, remapping empty categories through constructions like wh-movement, function tags for syntactic and semantic roles, and secondary dependencies involving structures like gapping, referents, right node raising, and open clausal subjects.
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
This paper describes a universal phrase tagset mapping between the French Treebank and English Penn Treebank using 9 phrase categories. It then applies this mapping to an unsupervised machine translation evaluation method that calculates similarity between the source and target sentences without reference translations. The method extracts phrase tags from the source and target, maps them to universal tags, and measures n-gram precision, recall, and position difference as similarity metrics. Evaluation on French-English data shows promising correlation with human judgments, though there is still room for improvement. The tagset and methods could facilitate future multilingual research.
The document discusses Arabic syntactic parsing. It introduces part-of-speech tagging, which assigns grammatical categories to individual words. Syntactic parsing then determines the overall structure of each sentence by showing how words relate to each other through trees or dependency structures. Parsing uncovers the syntactic structure of language and is important for natural language processing tasks, but Arabic presents challenges like complex morphology, pro-drop verbs, and flexible word order.
Victor von Doom, MD
Procedure: Colonoscopy Findings:
- Cecum, terminal ileum, ascending colon, hepatic flexure, transverse colon, splenic flexure, descending colon, sigmoid colon and rectum were examined to the anus.
- There were 3 polyps found in the sigmoid colon. The polyps were sessile and ranged in size from 5mm to 8mm. The polyps were removed using cold biopsy forceps and sent to pathology.
- The mucosa was otherwise normal in appearance with good preparation.
Assessment:
- 3 sigmoid colon polyps removed, ranging 5-8mm in size.
- Otherwise normal colonoscopy.
Plan:
- Await
The document discusses tree-adjoining grammars (TAG) as a mildly context-sensitive grammar formalism that can capture linguistic phenomena beyond context-free grammars while still allowing for polynomial time parsing. It introduces TAGs as consisting of elementary trees, which can be combined using substitution and adjunction operations. Examples are provided to illustrate how TAGs can be used to derive and parse sentences involving phenomena like wh-questions, relative clauses, light-verb constructions, and verb-particle constructions.
This document presents an approach to syntactic parsing that aims to achieve high accuracy with minimal structural annotation. The approach uses only surface features like the words in a span and the rules used to parse it, without additional annotations like part-of-speech tags or head words. Experimental results on the Penn Treebank show the approach achieves performance comparable to state-of-the-art parsers that use more complex annotations. The approach is also evaluated on multiple languages in the SPMRL 2013 shared task, outperforming baselines on average across nine treebanks.
This document discusses key concepts in natural language processing including:
- N-grams which are sequences of consecutive words that are used to capture word order.
- Parts of speech tagging which assigns a grammatical category (noun, verb, adjective) to each word.
- Parsing which involves analyzing a text and assigning structure according to context-free grammar rules, often represented as parse trees.
- Dependency parsing which identifies grammatical relationships between words directly rather than through constituents in a parse tree.
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)Yuki Arase
Slides for my EMNLP2017 presentation (edited for slideshare).
Paper: http://aclweb.org/anthology/D/D17/D17-1001.pdf
Supplementary material: http://aclweb.org/anthology/attachments/D/D17/D17-1001.Attachment.zip
Abstract: We propose an efficient method to conduct phrase alignment on parse forests for paraphrase detection. Unlike previous studies, our method identifies syntactic paraphrases under linguistically motivated grammar. In addition, it allows phrases to non-compositionally align to handle paraphrases with non-homographic phrase correspondences.
A dataset that provides gold parse trees and their phrase alignments is created. The experimental results confirm that the proposed method conducts highly accurate phrase alignment compared to human performance.
The document discusses the CLEAR dependency format, which adapts the Stanford dependency format. The CLEAR format remaps dependencies for empty categories, preserves function tags, and preserves secondary dependencies. It then provides details on CLEAR dependency labels, remapping empty categories through constructions like wh-movement, function tags for syntactic and semantic roles, and secondary dependencies involving structures like gapping, referents, right node raising, and open clausal subjects.
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
This paper describes a universal phrase tagset mapping between the French Treebank and English Penn Treebank using 9 phrase categories. It then applies this mapping to an unsupervised machine translation evaluation method that calculates similarity between the source and target sentences without reference translations. The method extracts phrase tags from the source and target, maps them to universal tags, and measures n-gram precision, recall, and position difference as similarity metrics. Evaluation on French-English data shows promising correlation with human judgments, though there is still room for improvement. The tagset and methods could facilitate future multilingual research.
The document discusses Arabic syntactic parsing. It introduces part-of-speech tagging, which assigns grammatical categories to individual words. Syntactic parsing then determines the overall structure of each sentence by showing how words relate to each other through trees or dependency structures. Parsing uncovers the syntactic structure of language and is important for natural language processing tasks, but Arabic presents challenges like complex morphology, pro-drop verbs, and flexible word order.
Victor von Doom, MD
Procedure: Colonoscopy Findings:
- Cecum, terminal ileum, ascending colon, hepatic flexure, transverse colon, splenic flexure, descending colon, sigmoid colon and rectum were examined to the anus.
- There were 3 polyps found in the sigmoid colon. The polyps were sessile and ranged in size from 5mm to 8mm. The polyps were removed using cold biopsy forceps and sent to pathology.
- The mucosa was otherwise normal in appearance with good preparation.
Assessment:
- 3 sigmoid colon polyps removed, ranging 5-8mm in size.
- Otherwise normal colonoscopy.
Plan:
- Await
The document discusses tree-adjoining grammars (TAG) as a mildly context-sensitive grammar formalism that can capture linguistic phenomena beyond context-free grammars while still allowing for polynomial time parsing. It introduces TAGs as consisting of elementary trees, which can be combined using substitution and adjunction operations. Examples are provided to illustrate how TAGs can be used to derive and parse sentences involving phenomena like wh-questions, relative clauses, light-verb constructions, and verb-particle constructions.
This document presents an approach to syntactic parsing that aims to achieve high accuracy with minimal structural annotation. The approach uses only surface features like the words in a span and the rules used to parse it, without additional annotations like part-of-speech tags or head words. Experimental results on the Penn Treebank show the approach achieves performance comparable to state-of-the-art parsers that use more complex annotations. The approach is also evaluated on multiple languages in the SPMRL 2013 shared task, outperforming baselines on average across nine treebanks.
This document discusses key concepts in natural language processing including:
- N-grams which are sequences of consecutive words that are used to capture word order.
- Parts of speech tagging which assigns a grammatical category (noun, verb, adjective) to each word.
- Parsing which involves analyzing a text and assigning structure according to context-free grammar rules, often represented as parse trees.
- Dependency parsing which identifies grammatical relationships between words directly rather than through constituents in a parse tree.
Monolingual Phrase Alignment on Parse Forests (EMNLP2017 presentation)Yuki Arase
Slides for my EMNLP2017 presentation (edited for slideshare).
Paper: http://aclweb.org/anthology/D/D17/D17-1001.pdf
Supplementary material: http://aclweb.org/anthology/attachments/D/D17/D17-1001.Attachment.zip
Abstract: We propose an efficient method to conduct phrase alignment on parse forests for paraphrase detection. Unlike previous studies, our method identifies syntactic paraphrases under linguistically motivated grammar. In addition, it allows phrases to non-compositionally align to handle paraphrases with non-homographic phrase correspondences.
A dataset that provides gold parse trees and their phrase alignments is created. The experimental results confirm that the proposed method conducts highly accurate phrase alignment compared to human performance.
Efficient Lattice Rescoring Using Recurrent Neural Network Language Models
X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland
ICASSP 2014
I introduced this paper at NAIST Machine Translation Study Group.
This document presents a new neural network model called Linguistically-Informed Self-Attention (LISA) for semantic role labeling. LISA uses multi-task learning across several NLP tasks and incorporates syntactic information through multi-head self-attention. One attention head is trained to attend to syntactic parents using a biaffine parsing mechanism. Experiments show LISA achieves state-of-the-art performance on in-domain and out-of-domain semantic role labeling benchmarks, and incorporating a gold syntactic parse at test time provides further gains. Analysis indicates the largest source of errors is incorrectly labeled semantic role spans.
Retrieving Correct Semantic Boundaries in Dependency StructureJinho Choi
This paper describes the retrieval of correct semantic boundaries for predicate-argument structures annotated by dependency structure. Unlike phrase structure, in which arguments are annotated at the phrase level, dependency structure does not have phrases so the argument labels are associated with head words instead: the subtree of each head word is assumed to include the same set of words as the annotated phrase does in phrase structure. However, at least in English, retrieving such subtrees does not always guarantee retrieval of the correct phrase boundaries. In this paper, we present heuristics that retrieve correct phrase boundaries for semantic arguments, called semantic boundaries, from dependency trees. By applying heuristics, we achieved an F1-score of 99.54% for correct representation of semantic boundaries. Furthermore, error analysis showed that some of the errors could also be considered correct, depending on the interpretation of the annotation.
The document discusses phrase structure grammar, which is a constituency grammar used in natural language processing. It defines the components of a phrase structure grammar as a set of non-terminals, terminals, a start symbol, and phrase structure rules. It also discusses Chomsky normal form, which restricts rules to be binary branching, and Greibach normal form, where rules start with a terminal symbol. The document provides examples of phrase structure trees for sentences and constructions like passive voice, wh-questions, relative clauses, and coordination.
This document discusses statistical parsing using probabilistic context-free grammars (PCFGs). It describes how PCFGs assign probabilities to parse trees, allowing syntactic ambiguity to be resolved by choosing the most probable parse. PCFGs can be learned from treebanks via supervised training or from unannotated text in an unsupervised manner. The document outlines PCFG notation and components, describes algorithms for finding the most likely parse tree for a sentence and computing sentence probabilities, and discusses using PCFGs for tasks like syntactic disambiguation, observation likelihood, and supervised grammar induction from treebanks.
This document discusses statistical parsing using probabilistic context-free grammars (PCFGs). It describes how PCFGs assign probabilities to parse trees, allowing syntactic ambiguity to be resolved by choosing the most probable parse. PCFGs can be learned from treebanks via supervised training or from unannotated text in an unsupervised manner. The document outlines PCFG notation and components, describes algorithms for finding the most likely parse tree for a sentence and computing sentence probabilities, and discusses using PCFGs for tasks like syntactic disambiguation, observation likelihood, and supervised grammar induction from treebanks.
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
Alignment algorithms are not just about placing reads in best-matching locations to a reference genome. They are now being expected to handle small insertions, deletions, gapped alignment of reads across intron boundaries and even span breakpoints of structural variations, fusions and copy number changes. At the same time, variant-calling algorithms can only reach their full potential by being intimately matched to the aligner's output or by doing local assemblies themselves. Knowing when these tools can be expected to perform well and when they will produce technical artifacts or be incapable of detecting features is critical when interpreting any analysis based on their output.
This presentation will compare the performance of the alignment and variant calling tools used by sequencing service providers including Illumina Genome Network, Complete Genomics and The Broad Institute. Using public samples analyzed by each pipeline, we will look at the level of concordance and dive into investigating problematic variants and regions of the genome.
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
This document provides an overview of RNA-seq data analysis. It discusses quality control of sequencing data using tools like FastQC, mapping reads to a reference genome or transcriptome using aligners like BWA and TopHat, and summarizing reads using counting tools to obtain read counts for each gene. These counts can then be used to estimate gene expression levels and perform differential expression analysis to identify genes with different expression between samples or conditions.
This document provides an overview of RNAseq analysis workflows. It discusses preparing raw sequencing reads, aligning reads to a reference genome or transcriptome, and using tools like Tophat and Cufflinks to assemble transcripts and quantify gene and transcript expression. Key steps include mapping reads, assembling transcripts, quantifying expression at the gene or transcript level, and using the results to identify differentially expressed genes between experimental conditions.
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
This slidedeck discusses the most biologically efficient, cost-effective method for successful NGS. The GeneRead DNA QuantiMIZE Kits enable determination of the optimum conditions for targeted enrichment of DNA isolated from biological samples, while the GeneRead DNAseq Panels V2 allow you to quickly and reliably deep sequence your genes of interest. Applications in translational and clinical research are highlighted.
Recursive neural networks (RNNs) were developed to model recursive structures like images, sentences, and phrases. RNNs construct feature representations recursively from components. Later models like recursive autoencoders (RAEs), matrix-vector RNNs (MV-RNNs), and recursive neural tensor networks (RNTNs) improved on RNNs by handling unlabeled data, incorporating different composition rules, and reducing parameters. These recursive models achieved strong performance on tasks like image segmentation, sentiment analysis, and paraphrase detection.
The document describes 4 experiments conducted on an LSTM model to analyze its internal dynamics when performing number agreement tasks. Experiment 1 identifies long-range units in the LSTM that encode singular and plural information over long distances. Experiment 2 visualizes the gate and cell state dynamics when handling easy and hard agreement contexts. Experiment 3 looks for short-range units that encode local number information. The experiments find that the LSTM uses both long-range and short-range units sparsely to perform number agreement in a way that mirrors syntactic processing.
This document discusses dot plots and their use in bioinformatics. It begins by defining dot plots as a graphical representation that uses two sequences on orthogonal axes and plots dots where regions of similarity meet a given threshold within a window. Dot plots allow visualization of all structures in common between sequences or repeated/inverted structures within a sequence. The document provides an example dot plot creation script in Perl and discusses how to reduce noise in dot plots by increasing the window size or stringency. It notes common uses of dot plots like comparing genomic and cDNA sequences to predict exons. Finally, it provides some rules of thumb for effective dot plot analysis and lists available dot plot programs.
The document discusses dot plots and their use in bioinformatics. It explains that dot plots are a graphical representation that uses two sequences as axes and plots dots where regions of similarity are found based on a given threshold and window size. Dot plots can be used to visualize all similarities and repeats within and between sequences. Reducing window size and increasing stringency can reduce noise in dot plots. Available programs for generating dot plots are also mentioned.
Sequencing run grief counseling: counting kmers at MG-RASTwltrimbl
Talk by Will Trimble of Argonne National Laboratory on April 29, 2014, at UIC's department of Ecology & Evolution on visualizing and interpreting the redundancy spectrum of long kmers in high-throughput sequence data.
dbSNP is a public archive of genetic polymorphisms including SNPs, insertions/deletions, and repeats. It contains contextual sequence information, frequency data, and experimental methods. dbSNP supports research areas like physical mapping, functional analysis, pharmacogenomics, and evolution. Variations are used as positional markers similar to STSs. Submitted SNPs are assigned IDs and aligned to reference genomes, clustering shared positions into reference SNP clusters. BLAST and FASTA tools allow searching. Null results of invariant sequences are also submitted.
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Creative-Biolabs
Neutralizing antibodies, pivotal in immune defense, specifically bind and inhibit viral pathogens, thereby playing a crucial role in protecting against and mitigating infectious diseases. In this slide, we will introduce what antibodies and neutralizing antibodies are, the production and regulation of neutralizing antibodies, their mechanisms of action, classification and applications, as well as the challenges they face.
Efficient Lattice Rescoring Using Recurrent Neural Network Language Models
X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland
ICASSP 2014
I introduced this paper at NAIST Machine Translation Study Group.
This document presents a new neural network model called Linguistically-Informed Self-Attention (LISA) for semantic role labeling. LISA uses multi-task learning across several NLP tasks and incorporates syntactic information through multi-head self-attention. One attention head is trained to attend to syntactic parents using a biaffine parsing mechanism. Experiments show LISA achieves state-of-the-art performance on in-domain and out-of-domain semantic role labeling benchmarks, and incorporating a gold syntactic parse at test time provides further gains. Analysis indicates the largest source of errors is incorrectly labeled semantic role spans.
Retrieving Correct Semantic Boundaries in Dependency StructureJinho Choi
This paper describes the retrieval of correct semantic boundaries for predicate-argument structures annotated by dependency structure. Unlike phrase structure, in which arguments are annotated at the phrase level, dependency structure does not have phrases so the argument labels are associated with head words instead: the subtree of each head word is assumed to include the same set of words as the annotated phrase does in phrase structure. However, at least in English, retrieving such subtrees does not always guarantee retrieval of the correct phrase boundaries. In this paper, we present heuristics that retrieve correct phrase boundaries for semantic arguments, called semantic boundaries, from dependency trees. By applying heuristics, we achieved an F1-score of 99.54% for correct representation of semantic boundaries. Furthermore, error analysis showed that some of the errors could also be considered correct, depending on the interpretation of the annotation.
The document discusses phrase structure grammar, which is a constituency grammar used in natural language processing. It defines the components of a phrase structure grammar as a set of non-terminals, terminals, a start symbol, and phrase structure rules. It also discusses Chomsky normal form, which restricts rules to be binary branching, and Greibach normal form, where rules start with a terminal symbol. The document provides examples of phrase structure trees for sentences and constructions like passive voice, wh-questions, relative clauses, and coordination.
This document discusses statistical parsing using probabilistic context-free grammars (PCFGs). It describes how PCFGs assign probabilities to parse trees, allowing syntactic ambiguity to be resolved by choosing the most probable parse. PCFGs can be learned from treebanks via supervised training or from unannotated text in an unsupervised manner. The document outlines PCFG notation and components, describes algorithms for finding the most likely parse tree for a sentence and computing sentence probabilities, and discusses using PCFGs for tasks like syntactic disambiguation, observation likelihood, and supervised grammar induction from treebanks.
This document discusses statistical parsing using probabilistic context-free grammars (PCFGs). It describes how PCFGs assign probabilities to parse trees, allowing syntactic ambiguity to be resolved by choosing the most probable parse. PCFGs can be learned from treebanks via supervised training or from unannotated text in an unsupervised manner. The document outlines PCFG notation and components, describes algorithms for finding the most likely parse tree for a sentence and computing sentence probabilities, and discusses using PCFGs for tasks like syntactic disambiguation, observation likelihood, and supervised grammar induction from treebanks.
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
Alignment algorithms are not just about placing reads in best-matching locations to a reference genome. They are now being expected to handle small insertions, deletions, gapped alignment of reads across intron boundaries and even span breakpoints of structural variations, fusions and copy number changes. At the same time, variant-calling algorithms can only reach their full potential by being intimately matched to the aligner's output or by doing local assemblies themselves. Knowing when these tools can be expected to perform well and when they will produce technical artifacts or be incapable of detecting features is critical when interpreting any analysis based on their output.
This presentation will compare the performance of the alignment and variant calling tools used by sequencing service providers including Illumina Genome Network, Complete Genomics and The Broad Institute. Using public samples analyzed by each pipeline, we will look at the level of concordance and dive into investigating problematic variants and regions of the genome.
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
This document provides an overview of RNA-seq data analysis. It discusses quality control of sequencing data using tools like FastQC, mapping reads to a reference genome or transcriptome using aligners like BWA and TopHat, and summarizing reads using counting tools to obtain read counts for each gene. These counts can then be used to estimate gene expression levels and perform differential expression analysis to identify genes with different expression between samples or conditions.
This document provides an overview of RNAseq analysis workflows. It discusses preparing raw sequencing reads, aligning reads to a reference genome or transcriptome, and using tools like Tophat and Cufflinks to assemble transcripts and quantify gene and transcript expression. Key steps include mapping reads, assembling transcripts, quantifying expression at the gene or transcript level, and using the results to identify differentially expressed genes between experimental conditions.
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
This slidedeck discusses the most biologically efficient, cost-effective method for successful NGS. The GeneRead DNA QuantiMIZE Kits enable determination of the optimum conditions for targeted enrichment of DNA isolated from biological samples, while the GeneRead DNAseq Panels V2 allow you to quickly and reliably deep sequence your genes of interest. Applications in translational and clinical research are highlighted.
Recursive neural networks (RNNs) were developed to model recursive structures like images, sentences, and phrases. RNNs construct feature representations recursively from components. Later models like recursive autoencoders (RAEs), matrix-vector RNNs (MV-RNNs), and recursive neural tensor networks (RNTNs) improved on RNNs by handling unlabeled data, incorporating different composition rules, and reducing parameters. These recursive models achieved strong performance on tasks like image segmentation, sentiment analysis, and paraphrase detection.
The document describes 4 experiments conducted on an LSTM model to analyze its internal dynamics when performing number agreement tasks. Experiment 1 identifies long-range units in the LSTM that encode singular and plural information over long distances. Experiment 2 visualizes the gate and cell state dynamics when handling easy and hard agreement contexts. Experiment 3 looks for short-range units that encode local number information. The experiments find that the LSTM uses both long-range and short-range units sparsely to perform number agreement in a way that mirrors syntactic processing.
This document discusses dot plots and their use in bioinformatics. It begins by defining dot plots as a graphical representation that uses two sequences on orthogonal axes and plots dots where regions of similarity meet a given threshold within a window. Dot plots allow visualization of all structures in common between sequences or repeated/inverted structures within a sequence. The document provides an example dot plot creation script in Perl and discusses how to reduce noise in dot plots by increasing the window size or stringency. It notes common uses of dot plots like comparing genomic and cDNA sequences to predict exons. Finally, it provides some rules of thumb for effective dot plot analysis and lists available dot plot programs.
The document discusses dot plots and their use in bioinformatics. It explains that dot plots are a graphical representation that uses two sequences as axes and plots dots where regions of similarity are found based on a given threshold and window size. Dot plots can be used to visualize all similarities and repeats within and between sequences. Reducing window size and increasing stringency can reduce noise in dot plots. Available programs for generating dot plots are also mentioned.
Sequencing run grief counseling: counting kmers at MG-RASTwltrimbl
Talk by Will Trimble of Argonne National Laboratory on April 29, 2014, at UIC's department of Ecology & Evolution on visualizing and interpreting the redundancy spectrum of long kmers in high-throughput sequence data.
dbSNP is a public archive of genetic polymorphisms including SNPs, insertions/deletions, and repeats. It contains contextual sequence information, frequency data, and experimental methods. dbSNP supports research areas like physical mapping, functional analysis, pharmacogenomics, and evolution. Variations are used as positional markers similar to STSs. Submitted SNPs are assigned IDs and aligned to reference genomes, clustering shared positions into reference SNP clusters. BLAST and FASTA tools allow searching. Null results of invariant sequences are also submitted.
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Creative-Biolabs
Neutralizing antibodies, pivotal in immune defense, specifically bind and inhibit viral pathogens, thereby playing a crucial role in protecting against and mitigating infectious diseases. In this slide, we will introduce what antibodies and neutralizing antibodies are, the production and regulation of neutralizing antibodies, their mechanisms of action, classification and applications, as well as the challenges they face.
Microbial interaction
Microorganisms interacts with each other and can be physically associated with another organisms in a variety of ways.
One organism can be located on the surface of another organism as an ectobiont or located within another organism as endobiont.
Microbial interaction may be positive such as mutualism, proto-cooperation, commensalism or may be negative such as parasitism, predation or competition
Types of microbial interaction
Positive interaction: mutualism, proto-cooperation, commensalism
Negative interaction: Ammensalism (antagonism), parasitism, predation, competition
I. Mutualism:
It is defined as the relationship in which each organism in interaction gets benefits from association. It is an obligatory relationship in which mutualist and host are metabolically dependent on each other.
Mutualistic relationship is very specific where one member of association cannot be replaced by another species.
Mutualism require close physical contact between interacting organisms.
Relationship of mutualism allows organisms to exist in habitat that could not occupied by either species alone.
Mutualistic relationship between organisms allows them to act as a single organism.
Examples of mutualism:
i. Lichens:
Lichens are excellent example of mutualism.
They are the association of specific fungi and certain genus of algae. In lichen, fungal partner is called mycobiont and algal partner is called
II. Syntrophism:
It is an association in which the growth of one organism either depends on or improved by the substrate provided by another organism.
In syntrophism both organism in association gets benefits.
Compound A
Utilized by population 1
Compound B
Utilized by population 2
Compound C
utilized by both Population 1+2
Products
In this theoretical example of syntrophism, population 1 is able to utilize and metabolize compound A, forming compound B but cannot metabolize beyond compound B without co-operation of population 2. Population 2is unable to utilize compound A but it can metabolize compound B forming compound C. Then both population 1 and 2 are able to carry out metabolic reaction which leads to formation of end product that neither population could produce alone.
Examples of syntrophism:
i. Methanogenic ecosystem in sludge digester
Methane produced by methanogenic bacteria depends upon interspecies hydrogen transfer by other fermentative bacteria.
Anaerobic fermentative bacteria generate CO2 and H2 utilizing carbohydrates which is then utilized by methanogenic bacteria (Methanobacter) to produce methane.
ii. Lactobacillus arobinosus and Enterococcus faecalis:
In the minimal media, Lactobacillus arobinosus and Enterococcus faecalis are able to grow together but not alone.
The synergistic relationship between E. faecalis and L. arobinosus occurs in which E. faecalis require folic acid
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Embracing Deep Variability For Reproducibility and Replicability
Abstract: Reproducibility (aka determinism in some cases) constitutes a fundamental aspect in various fields of computer science, such as floating-point computations in numerical analysis and simulation, concurrency models in parallelism, reproducible builds for third parties integration and packaging, and containerization for execution environments. These concepts, while pervasive across diverse concerns, often exhibit intricate inter-dependencies, making it challenging to achieve a comprehensive understanding. In this short and vision paper we delve into the application of software engineering techniques, specifically variability management, to systematically identify and explicit points of variability that may give rise to reproducibility issues (eg language, libraries, compiler, virtual machine, OS, environment variables, etc). The primary objectives are: i) gaining insights into the variability layers and their possible interactions, ii) capturing and documenting configurations for the sake of reproducibility, and iii) exploring diverse configurations to replicate, and hence validate and ensure the robustness of results. By adopting these methodologies, we aim to address the complexities associated with reproducibility and replicability in modern software systems and environments, facilitating a more comprehensive and nuanced perspective on these critical aspects.
https://hal.science/hal-04582287
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxgoluk9330
Ahota Beel, nestled in Sootea Biswanath Assam , is celebrated for its extraordinary diversity of bird species. This wetland sanctuary supports a myriad of avian residents and migrants alike. Visitors can admire the elegant flights of migratory species such as the Northern Pintail and Eurasian Wigeon, alongside resident birds including the Asian Openbill and Pheasant-tailed Jacana. With its tranquil scenery and varied habitats, Ahota Beel offers a perfect haven for birdwatchers to appreciate and study the vibrant birdlife that thrives in this natural refuge.
2. 2
Vanilla PCFG Limitations
• Since probabilities of productions do not rely on
specific words or concepts, only general structural
disambiguation is possible (e.g. prefer to attach
PPs to Nominals).
• Consequently, vanilla PCFGs cannot resolve
syntactic ambiguities that require semantics to
resolve, e.g. ate with fork vs. meatballs.
• In order to work well, PCFGs must be lexicalized,
3. Example of Importance of Lexicalization
• A general preference for attaching PPs to NPs
rather than VPs can be learned by a vanilla PCFG.
• But the desired preference can depend on specific
words.
3
S → NP VP
S → VP
NP → Det A N
NP → NP PP
NP → PropN
A → ε
A → Adj A
PP → Prep NP
VP → V NP
VP → VP PP
0.9
0.1
0.5
0.3
0.2
0.6
0.4
1.0
0.7
0.3
English
PCFG
Parser
S
NP VP
John V NP PP
put the dog in the pen
John put the dog in the pen.
3
4. 4
Example of Importance of Lexicalization
• A general preference for attaching PPs to NPs
rather than VPs can be learned by a vanilla PCFG.
• But the desired preference can depend on specific
words.
S → NP VP
S → VP
NP → Det A N
NP → NP PP
NP → PropN
A → ε
A → Adj A
PP → Prep NP
VP → V NP
VP → VP PP
0.9
0.1
0.5
0.3
0.2
0.6
0.4
1.0
0.7
0.3
English
PCFG
Parser
S
NP VP
John V NP
put the dog in the pen
X
John put the dog in the pen.
4
6. Head Words
• Syntactic phrases usually have a word in them that
is most “central” to the phrase.
• Linguists have defined the concept of a lexical
head of a phrase.
• Simple rules can identify the head of any phrase
by percolating head words up the parse tree.
– Head of a VP is the main verb
– Head of an NP is the main noun
– Head of a PP is the preposition
– Head of a sentence is the head of its VP
6
7. Head propagation in parse trees
VP (bought)
bought
NP (book)
the
NN (book)
book
S (bought)
the
NN (man)
man DT (the)
V (bought)DT (the)
NP (man)
7
8. Lexicalized Productions
• Specialized productions can be generated by
including the head word and its POS of each non-
terminal as part of that non-terminal’s symbol.
S
VP
VBD NP
DT Nominal
Nominal PP
liked
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
pen-NN
pen-NN
in-IN
dog-NN
dog-NN
dog-NN
liked-VBD
liked-VBD
John-NNP
Nominaldog-NN → Nominaldog-NN PPin-IN
8
9. Lexicalized Productions
S
VP
VP PP
DT Nominalput
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
pen-NN
pen-NN
in-IN
dog-NN
dog-NN
put-VBD
put-VBD
John-NNP
NPVBD
put-VBD
VPput-VBD → VPput-VBD PPin-IN
9
10. Parameters
• We used to have
– VP -> V NP PP P(r|VP)
• That’s the count of this rule divided by the number
of VPs in a treebank
• Now we have
– VP(dumped-VBD)-> V(dumped-VBD) NP(sacks-NNS)PP(into-P)
– P(r where dumped is the verb ^ sacks is the head of the NP ^ into is the head of the
PP|VP whose head is dumped-VBD)
– Not likely to have significant counts in any
treebank
10
11. Challenges for Probabilistic Lexicalized
Grammar
• Huge number of rules
• Most rules are too specific, so it is hard to make good
probability estimates for them
• Need to make more manageable independence
assumptions
• Many approaches have been developed; We’ll look at one
(simplified version)
11
12. Collins Parser [1997]
• Replaces production rules with a model of
bilexical relationships
• Focus on head and dependency relationship
(see notes in lecture)
• Generate parses in smaller steps; generate
the head, then generate siblings assuming
they are conditionally independent given
the head
12
13. Dependency Representation
• Represented as an 8 tuple:
(head word, head tag, modifier word, modifier tag,
parent non-terminal, head non-terminal,
modifier non-terminal, direction)
Example CFG rule:
VP(bought, VBD) V(bought, VBD) NP(book,NN) PP(with,Prep)
Decomposes into two dependencies
(bought, VBD, book, NN, VP, V, NP, right)
(bought, VBD, with, Prep, VP, V, PP, right)
13
14. Collins: (see lecture)
VP(bought, VBD) V(bought, VBD) NP(book,NN) PP(with,Prep)
(head word, head tag, modifier word, modifier tag,parent non-terminal, head non-
terminal, modifier non-terminal, direction)
1. Prhead(headNT| parentNT, headword, headtag)x
2. Prleft(modNT, modword, modtag | parentNT, headNT, headword, headtag) x … x
3. Prleft(STOP| parentNT, headNT, headword, headtag)x
4. Prright(modNT, modword, modtag | parentNT, headNT, headword, headtag) x … x
5. Prright(STOP| parentNT, headNT, headword, headtag)
6. Prhead(V| VP, bought, VBD) x
7. Prleft(STOP| VP, V, bought, VBD)x
8. Prright(NP, book, NN | VP, V, bought, VBD) x
9. Prright(PP, with, Prep | VP, V, bought, VBD) x
10. Prright(STOP| VP, V, bought, VBD)
14
15. Estimating Production Generation Parameters
• Smooth estimates by linearly interpolating with
simpler models conditioned on just POS tag or no
lexical info. Using the version in the text:
smPR(PPin-IN | VPput-VBD) = λ1 PR(PPin-IN | VPput-VBD)
+ (1− λ1) (λ2 PR(PPin-IN | VPVBD) +
(1− λ2) PR(PPin-IN | VP))
15
16. Missed Context Dependence
• Another problem with CFGs is that which
production is used to expand a non-terminal
is independent of its context.
• However, this independence is frequently
violated for normal grammars.
– NPs that are subjects are more likely to be
pronouns than NPs that are objects.
16
17. Splitting Non-Terminals
• To provide more contextual information,
non-terminals can be split into multiple new
non-terminals based on their parent in the
parse tree using parent annotation.
– A subject NP becomes NP^S since its parent
node is an S.
– An object NP becomes NP^VP since its parent
node is a VP
17
18. Parent Annotation Example
18
S
VP
VBD NP
DT Nominal
Nominal PP
liked
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
^NP
^PP
^Nominal
^Nominal
^NP
^VP
^S^S
^Nominal
^NP
^PP
^Nominal
^NP
^VP^NP
VP^S → VBD^VP NP^VP
19. Split and Merge
• Non-terminal splitting greatly increases the size of
the grammar and the number of parameters that need
to be learned from limited training data.
• Best approach is to only split non-terminals when it
improves the accuracy of the grammar.
• May also help to merge some non-terminals to
remove some un-helpful distinctions and learn more
accurate parameters for the merged productions.
• Method: Heuristically search for a combination of
splits and merges that produces a grammar that
maximizes the likelihood of the training treebank.
19
20. 20
Treebanks
• English Penn Treebank: Standard corpus for
testing syntactic parsing consists of 1.2 M words
of text from the Wall Street Journal (WSJ).
• Typical to train on about 40,000 parsed sentences
and test on an additional standard disjoint test set
of 2,416 sentences.
• Chinese Penn Treebank: 100K words from the
Xinhua news service.
• Other corpora existing in many languages, see the
Wikipedia article “Treebank”
23. 23
Parsing Evaluation Metrics
• PARSEVAL metrics measure the fraction of the
constituents that match between the computed and
human parse trees. If P is the system’s parse tree and T
is the human parse tree (the “gold standard”):
– Recall = (# correct constituents in P) / (# constituents in T)
– Precision = (# correct constituents in P) / (# constituents in P)
• Labeled Precision and labeled recall require getting the
non-terminal label on the constituent node correct to
count as correct.
• F1 is the harmonic mean of precision and recall.
24. Computing Evaluation Metrics
Correct Tree T
S
VP
Verb NP
Det Nominal
Nominal PP
book
Prep NP
through
Houston
Proper-Noun
the
flight
Noun
Computed Tree P
VP
Verb NP
Det Nominalbook
Prep NP
through
Houston
Proper-Noun
the
flight
Noun
S
VP
PP
# Constituents: 12 # Constituents: 12
# Correct Constituents: 10
Recall = 10/12= 83.3% Precision = 10/12=83.3% F1 = 83.3%
24
25. 25
Treebank Results
• Results of current state-of-the-art systems on the
English Penn WSJ treebank are slightly greater than
90% labeled precision and recall.
26. Discriminative Parse Reranking
• Motivation: Even when the top-ranked
parse not correct, frequently the correct
parse is one of those ranked highly by a
statistical parser.
• Use a classifier that is trained to select the
best parse from the N-best parses produced
by the original parser.
• Reranker can exploit global features of the
entire parse whereas a PCFG is restricted to
making decisions based on local info. 26
27. 2-Stage Reranking Approach
• Adapt the PCFG parser to produce an N-
best list of the most probable parses in
addition to the most-likely one.
• Extract from each of these parses, a set of
global features that help determine if it is a
good parse tree.
• Train a classifier (e.g. logistic regression)
using the best parse in each N-best list as
positive and others as negative.
27
29. Sample Parse Tree Features
• Probability of the parse from the PCFG.
• The number of parallel conjuncts.
– “the bird in the tree and the squirrel on the ground”
– “the bird and the squirrel in the tree”
• The degree to which the parse tree is right branching.
– English parses tend to be right branching (cf. parse of
“Book the flight through Houston”)
• Frequency of various tree fragments, i.e. specific
combinations of 2 or 3 rules.
29
30. Evaluation of Reranking
• Reranking is limited by oracle accuracy,
i.e. the accuracy that results when an
omniscient oracle picks the best parse from
the N-best list.
• Typical current oracle accuracy is around
F1=97%
• Reranking can generally improve test
accuracy of current PCFG models a
percentage point or two.
30
31. Dependency Grammars
• An alternative to phrase-structure grammar is to
define a parse as a directed graph between the
words of a sentence representing dependencies
between the words.
31
liked
John dog
pen
inthe
the
liked
John dog
pen
in
the
the
nsubj dobj
det
det
Typed
dependency
parse
32. Dependency Graph from Parse Tree
• Can convert a phrase structure parse to a dependency
tree by making the head of each non-head child of a
node depend on the head of the head child.
32
S
VP
VBD NP
DT Nominal
Nominal PP
liked
IN NP
in
the
dog
NN
DT Nominal
NNthe
pen
NNP
NP
John
pen-NN
pen-NN
in-IN
dog-NN
dog-NN
dog-NN
liked-VBD
liked-VBD
John-NNP
liked
John dog
pen
inthe
the
33. Statistical Parsing Conclusions
• Statistical models such as PCFGs allow for
probabilistic resolution of ambiguities.
• PCFGs can be easily learned from treebanks.
• Lexicalization and non-terminal splitting are
required to effectively resolve many
ambiguities.
• Current statistical parsers are quite accurate but
not yet at the level of human-expert agreement.
33