Video recording of full talk: http://lectures.ms.mff.cuni.cz/view.php?rec=255
There has been an increasing interest in using statistical machine translation (SMT) for the task of Grammatical Error Correction (GEC) for English-as-a-Second-Language (ESL) learners. Two of the three highest-scoring systems of the CoNLL-2014 Shared Task were SMT-based. The currently highest-scoring result published for the CoNLL-2014 test set has been achieved by a system combination of the five best CoNLL-2014 submissions built with MEMT (a tool for MT system combination). In this talk, we demonstrate how a single SMT-based system can match and outperform the result of the mentioned combined system. Furthermore, this system outperforms any other published results (including our own CoNLL-2014 submission) for a single system by a margin of several percent F-Score when the same training data is being used. These results are achieved by adapting current state-of-the art methods for phrase-based SMT specifically to the problem of GEC.
We report on the effects of:
- Parameter tuning for GEC
- Introducing GEC-specific dense and sparse features
- Using large-scale data
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013) pp. 215-222. Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...Minh Pham
The presentation of a feature-based model for nested named-entity recognition at VLSP 2018. Our system obtained the first rank among participant systems. There is still a gap between the accuracy on the development set and the test set.
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013) pp. 215-222. Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...Minh Pham
The presentation of a feature-based model for nested named-entity recognition at VLSP 2018. Our system obtained the first rank among participant systems. There is still a gap between the accuracy on the development set and the test set.
Paper Introduction,
"Translating into Morphologically Rich Languages with Synthetic Phrases"
Victor Chahuneau, Eva Schlinger, Noah A. Smith, Chris Dyer (EMNLP2013)
Machine Translation (MT) refers to the use of computers for the task of translating
automatically from one language to another. The differences between languages and
especially the inherent ambiguity of language make MT a very difficult problem. Traditional
approaches to MT have relied on humans supplying linguistic knowledge in the form of rules
to transform text in one language to another. Given the vastness of language, this is a highly
knowledge intensive task. Statistical MT is a radically different approach that automatically
acquires knowledge from large amounts of training data. This knowledge, which is typically
in the form of probabilities of various language features, is used to guide the translation
process. This report provides an overview of MT techniques, and looks in detail at the basic
statistical model.
Connectionist language models offer many advantages over their statistical counterparts, but they also have some drawbacks like a much more expensive computational cost. This paper describes a novel method to overcome this problem. A set of normalization values associated to the most frequent N-grams is pre-computed and the model is smoothed with lower N-gram connectionist or statistical models. The
proposed approach is favourably compared to standard connectionist language models and with statistical back-off language models.
Research on word representation models, word embeddings, has gained a lot of attention in the recent years thanks to Word2Vec by Mikolov et al. The main purpose of this work is to validate previously proposed experiments for the English language and then trying to figure out if it is possibile to reproduce the same accuracy and performance with the Italian language.
Word2vec on the italian language: first experimentsVincenzo Lomonaco
Word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent years. The vector representations of words learned by word2vec models have been proven to be able to carry semantic meanings and are useful in various NLP tasks. In this work I try to reproduce the previously obtained results for the English language and to explore the possibility of doing the same for the Italian language.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...Jinho Choi
Recent advances in deep learning have facilitated the demand of neural models for real applications. In practice, these applications often need to be deployed with limited resources while keeping high accuracy. This paper touches the core of neural models in NLP, word embeddings, and presents a new embedding distillation framework that remarkably reduces the dimension of word embeddings without compromising accuracy. A novel distillation ensemble approach is also proposed that trains a high-efficient student model using multiple teacher models. In our approach, the teacher models play roles only during training such that the student model operates on its own without getting supports from the teacher models during decoding, which makes it eighty times faster and lighter than other typical ensemble methods. All models are evaluated on seven document classification datasets and show significant advantage over the teacher models for most cases. Our analysis depicts insightful transformation of word embeddings from distillation and suggests a future direction to ensemble approaches using neural models.
Paper Introduction,
"Translating into Morphologically Rich Languages with Synthetic Phrases"
Victor Chahuneau, Eva Schlinger, Noah A. Smith, Chris Dyer (EMNLP2013)
Machine Translation (MT) refers to the use of computers for the task of translating
automatically from one language to another. The differences between languages and
especially the inherent ambiguity of language make MT a very difficult problem. Traditional
approaches to MT have relied on humans supplying linguistic knowledge in the form of rules
to transform text in one language to another. Given the vastness of language, this is a highly
knowledge intensive task. Statistical MT is a radically different approach that automatically
acquires knowledge from large amounts of training data. This knowledge, which is typically
in the form of probabilities of various language features, is used to guide the translation
process. This report provides an overview of MT techniques, and looks in detail at the basic
statistical model.
Connectionist language models offer many advantages over their statistical counterparts, but they also have some drawbacks like a much more expensive computational cost. This paper describes a novel method to overcome this problem. A set of normalization values associated to the most frequent N-grams is pre-computed and the model is smoothed with lower N-gram connectionist or statistical models. The
proposed approach is favourably compared to standard connectionist language models and with statistical back-off language models.
Research on word representation models, word embeddings, has gained a lot of attention in the recent years thanks to Word2Vec by Mikolov et al. The main purpose of this work is to validate previously proposed experiments for the English language and then trying to figure out if it is possibile to reproduce the same accuracy and performance with the Italian language.
Word2vec on the italian language: first experimentsVincenzo Lomonaco
Word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent years. The vector representations of words learned by word2vec models have been proven to be able to carry semantic meanings and are useful in various NLP tasks. In this work I try to reproduce the previously obtained results for the English language and to explore the possibility of doing the same for the Italian language.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...Jinho Choi
Recent advances in deep learning have facilitated the demand of neural models for real applications. In practice, these applications often need to be deployed with limited resources while keeping high accuracy. This paper touches the core of neural models in NLP, word embeddings, and presents a new embedding distillation framework that remarkably reduces the dimension of word embeddings without compromising accuracy. A novel distillation ensemble approach is also proposed that trains a high-efficient student model using multiple teacher models. In our approach, the teacher models play roles only during training such that the student model operates on its own without getting supports from the teacher models during decoding, which makes it eighty times faster and lighter than other typical ensemble methods. All models are evaluated on seven document classification datasets and show significant advantage over the teacher models for most cases. Our analysis depicts insightful transformation of word embeddings from distillation and suggests a future direction to ensemble approaches using neural models.
Error Analysis for Sentences provides an easy reference to check for language errors inherent in our writing.
Download free worksheets for Error Analysis from www.languagelab.sg/books (Textbook 19)
Error correction using redundant residue number systemNimi T
Communication channel are high prone to noise and there fore error correcting codes are required in order to protect transmitted data. RRNS is obtained by adding residues to RNS. Fault tolerance and less memory requirements are main advantage of RRNS.
Reshaping the Value of Grammatical Feedback on Writing Using ColorsToyo University
This presentation introduces a new approach to error correction that features the use of colors as a code. Using colors to highlight patterns in students’ errors allows them to readily notice their strengths and weaknesses in grammar. This enhanced awareness focuses attention and motivates students to develop their grammatical accuracy.
Sentence Types Structural classification (Part I: Simple, Compound, Compound-Complex Sentence)
Compiled by: Belachew Weldegebriel
Jimma University
CSSH
Department of English Language and Literature
Types of Sentences: Structural
Sentences are classified in to four according to their structure i.e. the type and number of clauses it consists.
1. Simple Sentence
2. Compound Sentence
3. Complex Sentence
4. Compound-Complex Sentence
1.Simple Sentence
A simple sentence, also called an independent clause, contains a subject and a verb, and it expresses a complete thought.
The dog barked.
The baby cried.
Girma and Alemu play football every afternoon.
St. George played well and won the game.
Simple Sentence
A simple sentence contains a subject and verb.
It expresses a single complete thought.
A simple sentence is a single independent clause.
A simple sentence might have a compound subject and/or compound verb.
Simple Sentence withCompound Subject and/or Compound Verb
The simple sentence may have a compound subject: The dog and the cat howled.
It may have a compound verb:
The dog howled and barked.
It may have a compound subject and a compound verb:
The dog and the cat howled and yowled respectively.
2. Compound Sentence
A compound sentence consists of two or more simple sentences joined by
(1) a comma followed by a coordinating conjunction
(and, but, or, nor, for, yet, so):
The dog barked, and the cat yowled.
(2) a semicolon:
The dog barked; the cat yowled.
(3) a comma, but ONLY when the simple sentences
are being treated as items in a series:
The dog barked, the cat yowled, and the rabbit
chewed.
Compound Sentence
A compound sentence contains two independent clauses joined by a coordinator. The coordinators are as follows: for, and, nor, but, or, yet, so (FANBOYS).
A compound sentence can also consist of two independent clauses joined by semi-colon
3. Complex Sentence
A complex sentence contains an independent clause and at least one dependent clause.
A. When he handed in his homework, he
forgot to give the teacher the last page. B. The teacher returned the homework after
she noticed the error. C. The students are studying because they
have a test tomorrow.D. After they finished studying, Juan and
Maria went to the movies. E. Juan and Maria went to the movies after
they finished studying.
COMPLEX SENTENCES / ADJECTIVE CLAUSES
A. The woman who(m) my mom talked to
sells cosmetics.B. The book that Jonathan read is on the shelf.C. A girl whom I know was recently accepted
to Harvard University.
D. The Eiffel Tower, which is located in Paris,
is visited by millions of tourists annually.
The underlined part is the independent clause.
A brief explanation of the result of repetitive actions and a lesson on the three reasons for grammar errors with solutions for and example of including a specific effective editing technique. Created by Coleman's Classroom.
A single stage single constraints linear fractional programming problem an ap...orajjournal
In the present paper we present a new method for solving a class of single stage single constraints linear fractional programming (LFP) problem. The proposed method is based on transformation the objective value and the constraints also. After reducing the fractional program in to equivalent linear program with the help of transformation technique, after that we apply Simplex method to find objective value. Numerical examples are constructed to show the applicability of the above technique
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
These are the slides for the technical briefing given at ICSE 2021, given by Alessio Ferrari, Liping Zhao, and Waad Alhoshan
It covers RE tasks to which NLP is applied, an overview of a recent systematic mapping study on the topic, and a hands-on tutorial on using transfer learning for requirements classification.
Please find the links to the colab notebooks here:
https://colab.research.google.com/drive/158H-lEJE1pc-xHc1ISBAKGDHMt_eg4Gn?usp=sharing
https://colab.research.google.com/d rive/1B_5ow3rvS0Qz1y-KyJtlMNnm gmx9w3kJ?usp=sharing
https://colab.research.google.com/d rive/1Xrm0gNaa41YwlM5g2CRYYX cRvpbDnTRT?usp=sharing
Over the last two years, the field of Natural Language Processing (NLP) has witnessed the emergence of transfer learning methods and architectures which significantly improved upon the state-of-the-art on pretty much every NLP tasks.
The wide availability and ease of integration of these transfer learning models are strong indicators that these methods will become a common tool in the NLP landscape as well as a major research direction.
In this talk, I'll present a quick overview of modern transfer learning methods in NLP and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks, focusing on open-source solutions.
Website: https://fwdays.com/event/data-science-fwdays-2019/review/transfer-learning-in-nlp
NLP Town's Yves Peirsman talk about how word embeddings, LSTMs/RNNs, Attention, Encoder-Decoder architectures, and more have helped NLP forward, including which challenges remain to be tackled (and some techniques to do just that).
The NLP muppets revolution! @ Data Science London 2019
video: https://skillsmatter.com/skillscasts/13940-a-deep-dive-into-contextual-word-embeddings-and-understanding-what-nlp-models-learn
event: https://www.meetup.com/Data-Science-London/events/261483332/
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
DELAB - sequence generation seminar
Title
Open vocabulary problem
Table of contents
1. Open vocabulary problem
1-1. Open vocabulary problem
1-2. Ignore rare words
1-3. Approximative Softmax
1-4. Back-off Models
1-5. Character-level model
2. Solution1: Byte Pair Encoding(BPE)
3. Solution2: WordPieceModel(WPM)
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...AI Frontiers
Sequence to sequence learning is a powerful way to train deep networks for machine translation, various NLP tasks, but also image generation and recently video and music generation. We will give a hands-on tutorial showing how to use the open-source Tensor2Tensor library to train state-of-the-art models for translation, image generation, and a task of your choice!
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
In this talk I'll start by introducing the recent breakthroughs in NLP that resulted from the combination of Transfer Learning schemes and Transformer architectures. The second part of the talk will be dedicated to an introduction of the open-source tools released by Hugging Face, in particular our transformers, tokenizers, and NLP libraries as well as our distilled and pruned models.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it right
1. Grammatical Error Correction
for ESL-Learners by SMT
Getting it right
Marcin Junczys-Dowmunt Roman Grundkiewicz
Information System Laboratory
Faculty for Math and Computer Science
Adam Mickiewicz University in Pozna´n, Poland
January 5th, 2015
2. Part I
What’s been going on in with SMT in
Grammatical Error Correction (GEC)?
3. The CoNLL-2014 Shared Task
Focus: English as a second language (ESL);
Preceeded by the HOO 2011 and 2012 (Helping Our Own)
shared tasks: detection and correction of determiner and
preposition errors in scientific papers;
Preceeded by the CoNLL-2013 shared task on Grammatical
Error Correction: correct five error types, determiners,
prepositions, noun number, subject-verb-agreement, verb
forms. 16 teams.
This year, 28 error categories.
13 teams participated.
4. NUCLE: NUS Corpus of Learner English
Dahlmeier et. al 2013
National University of Singapore (NUS) Natural Language
Processing Group led by Prof. Hwee Tou Ng and the NUS
Centre for English Language Communication led by Prof.
Siew Mei Wu;
1,400 essays, written by NUS students, ca. 1 million words,
ca. 51,000 sentences;
Topics: environmental pollution, healthcare, etc.;
Annotated with 28 error categories by NUS English
instructors.
5. NUCLE: NUS Corpus of Learner English
Example
<DOC nid="840">
<TEXT>
<P>Engineering design process can be defined ... </P>
<P>Firstly, engineering design ... </P>
...
</TEXT>
<ANNOTATION teacher_id="173">
<MISTAKE start_par="0" start_off="0" end_par="0" end_off="26">
<TYPE>ArtOrDet</TYPE>
<CORRECTION>The engineering design process</CORRECTION>
</MISTAKE>
...
</ANNOTATION>
</DOC>
6. NUCLE: NUS Corpus of Learner English
NUCLE Error Types I
* 6647 ArtOrDet Article or Determiner
5300 Wci Wrong collocation/idiom
4668 Rloc- Local redundancy
* 3770 Nn Noun number
3200 Vt Verb tense
3054 Mec Punctuation, spelling
* 2412 Prep Preposition
2160 Wform Word form
* 1527 SVA Subject-verb-agreement
1452 Others Other errors
* 1444 Vform Verb form
1349 Trans Link word/phrases
1073 Um Unclear meaning
925 Pref Pronoun reference
7. NUCLE: NUS Corpus of Learner English
NUCLE Error Types II
861 Srun Runons, comma splice
674 WOinc Incorrect sentence form
571 Wtone Tone
544 Cit Citation
515 Spar Parallelism
431 Vm Verb modal
411 V0 Missing verb
353 Ssub Subordinate clause
345 WOadv Adverb/adjective position
242 Npos Noun possesive
186 Pform Pronoun form
174 Sfrag Fragment
50 Wa Acronyms
47 Smod Dangling modifier
8. Evaluation Metric: MaxMatch (M2
)
Dahlmeier and Ng 2012
An algorithm for efficiently computing the sequence of
phrase-level edits between a source sentence and a hypothesis.
Finds the maximum overlap with the gold-standard.
Based on Levenshtein distance matrix for candidate and
reference sentence.
The optimal edit sequence is scored using F1.0 (CoNLL 2013)
or F0.5 measure (CoNLL 2014).
9. Evaluation Metric: MaxMatch (M2
)
Example calculation
S The cat sat at mat .
A 3 4|||Prep|||on|||REQUIRED|||-NONE-|||0
A 4 4|||ArtOrDet|||the||a|||REQUIRED|||-NONE-|||0
S The dog .
A 1 2|||NN|||dogs|||REQUIRED|||-NONE-|||0
S Giant otters is an apex predator .
A 2 3|||SVA|||are|||REQUIRED|||-NONE-|||0
A 3 4|||ArtOrDet|||-NONE-|||REQUIRED|||-NONE-|||0
A 5 6|||NN|||predators|||REQUIRED|||-NONE-|||0
A 1 2|||NN|||otter|||REQUIRED|||-NONE-|||1
A cat sat on the mat .
The dog .
Giant otters are apex predator .
11. Grammatical Error Correction by Data-Intensive and
Feature-Rich Statistical Machine Translation
Marcin Junczys-Dowmunt, Roman Grundkiewicz. CoNLL-2014 Shared Task
Grammatical error correction seen as translation from
erroneous into corrected sentences.
Baseline is a standard phrase-based Moses set-up.
Explore the interaction of:
Parameter optimization
Web-scale language models
Larger error-annoted resources as parallel data
Task-specific dense and sparse features
Data used:
TM: NUCLE, Lang-8 (scraped)
LM: TM data, Wikipedia, CommonCrawl (Buck et. al 2014)
12. System Combination for Grammatical Error Correction
Raymond Hendy Susanto, Peter Phandi, Hwee Tou Ng. EMNLP 2014
Uses MEMT to combine several GEC systems:
Two classifier-based systems and two SMT-based systems.
Various n-top combinations from the CoNLL-2014 shared task.
Data used:
Classifiers: POS-Taggers, Chunkers, etc.
SMT-TM: NUCLE, Lang-8 (Mizumoto et al. 2012)
SMT-LM: TM data, Wikipedia
System Combination: all CoNLL-2014 submissions
14. Lang-8.com
Examples
I thought/think that they are/there is a good combination between
winter and reading .
Today I have/had a bad began/beginning .
I wanna improve my skill/skills !
Is there somebody/anybody who wants to be friend/friends with me ?
If you need more information , please don ’t hesitate to tell/contact me
please .
15. Language model data
Corpus Sentences Tokens
NUCLE 57.15 K 1.15 M
Lang-8 2.23 M 30.03 M
Wikipedia 213.08 M 3.37 G
CommonCrawl 59.13 G 975.63 G
16. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
17. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
18. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
19. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
?
M2
0.5
21. Lesson 1: Implement task-specific features
Add features that seem to be relevant for GEC: Levenshtein distance
source phrase (s) target phrase (t) LD
a short time . short term only . 3
a situation into a situation 1
a supermarket . a supermarket . 0
a supermarket . at a supermarket 1
able unable 1
LD(s, t) ≡ Levenshtein Distance between source phrase (s)
and target phrase (t) in words;
Feature computes eLD(s,t), sums to total number of edits
applied to sentence in log-linear model.
22. Lesson 1: Implement task-specific features
Add features that seem to be relevant for GEC: Levenshtein distance
source phrase (s) target phrase (t) LD D I S
a short time . short term only . 3 1 1 1
a situation into a situation 1 0 1 0
a supermarket . a supermarket . 0 0 0 0
a supermarket . at a supermarket 1 1 0 0
able unable 1 0 0 1
LD(s, t) ≡ Levenshtein Distance between source phrase (s)
and target phrase (t) in words;
Feature computes eLD(s,t), sums to total number of edits
applied to sentence in log-linear model.
From the Levenshtein distance matrix, compute counts for
deletions (D), inserts (I), and substitutions (S). eD, eI , and
eS are additive in log-linear models.
23. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
24. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
25. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then
26. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL ,
27. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
28. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new
29. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
30. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes
31. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes INS out
32. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes INS out TRANS . TO .
33. Lesson 1: Implement task-specific features
A stateful feature: Operation Sequence Model
Target: Hence , a new problem surfaces .
Source: Then a new problem comes out .
TRANS Hence TO Then DEL , TRANS a TO a
TRANS new TO new TRANS problem TO problem
TRANS surfaces TO comes INS out TRANS . TO .
Create sequences for all sentence pairs.
Compute n-gram language model over operation sequences.
OSM usually contains reordering operations, here degenerates
to edit sequences.
34. Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
35. Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
36. Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
• (E)dits: counts inserts,
deletions, replacements.
37. Lesson 1: Implement task-specific features
Results optmized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
• (E)dits: counts inserts,
deletions, replacements.
• (O)peration Sequence
Model: without reordering
degenerates to stateful
sequence model of edits.
38. Lesson 2: Optimize the right metric
Implement an optimizer for the metric that you have to use in the end
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
• (B)aseline: vanilla Moses,
disabled reordering.
• (L)evenshtein Distance:
word-based as TM score.
Sums to number of edits.
• (E)dits: counts inserts,
deletions, replacements.
• (O)peration Sequence
Model: without reordering
degenerates to stateful
sequence model of edits.
39. Lesson 2: Optimize the right metric
Implement an optimizer for the metric that you have to use in the end
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with M2
40. Lesson 3: Account for optimizer instability
Or don’t count on good luck (Clark et al. 2011)
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with M2
41. Lesson 3: Account for optimizer instability
Averaged weights beat mean scores (Cettolo et al. 2011)
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with BLEU
B L E O
36.0
38.0
40.0
42.0
M2
0.5
Optimized with M2
42. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
43. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
44. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
45. Lesson 4: Bigger development sets are better
Maybe a little too big
Until now, CoNLL-2013 test set used as dev set:
Consists of only 1381 sentences with 3461 error annotations.
Rate of erroneous tokens: 14.97%
46. Lesson 4: Bigger development sets are better
Maybe a little too big
Until now, CoNLL-2013 test set used as dev set:
Consists of only 1381 sentences with 3461 error annotations.
Rate of erroneous tokens: 14.97%
But there is also NUCLE, currently used as training data:
Consists of only 57151 sentences with 44385 error annotations.
Rate of erroneous tokens: 6.23%
47. Lesson 4: Bigger development sets are better
Maybe a little too big
Until now, CoNLL-2013 test set used as dev set:
Consists of only 1381 sentences with 3461 error annotations.
Rate of erroneous tokens: 14.97%
But there is also NUCLE, currently used as training data:
Consists of only 57151 sentences with 44385 error annotations.
Rate of erroneous tokens: 6.23%
We are not supposed to look at the error rate of the
CoNLL-2014 test set (ca. 10% for Annotator 1, ca. 12% for
Annotator 2)
48. Lesson 4: Bigger development sets are better
Turning NUCLE into a development set via cross validation
1. Greedily remove sentences until error rate 0.15 is reached.
Leaves: 23381 sentences, 39707 annotations.
2. Divide into 4 equal-sized subsets.
3. Tune on 1 subset, add remaining 3 subsets to training data,
repeat for each subset.
4. Average weights for all subsets into single weight vector.
5. Repeat steps 3-4 a couple of times (here: 5) to average
weights accross iterations.
49. Lesson 4: Bigger development sets are better
Turning NUCLE into a development set via cross validation
We are not using the previous dev set, though it could just be
added to all subsets.
It is nice to have a second test set (although slighlty cheating
due to matching error rate).
Effect of error rate on tuning:
Low error rate: high precision.
High error rate: high recall.
Future work: Can the error rate be used to adjust training
data, too?
50. Lesson 4: Bigger development sets are better
Performance jump from switching dev sets
B L E O
38.0
40.0
42.0
44.0
M2
0.5
51. Lesson 4: Bigger development sets are better
Performance jump from switching dev sets
B L E O
38.0
40.0
42.0
44.0
M2
0.5
52. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
53. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
56. Lesson 7: There’s something about sparse-features
But tuning them correctly is hard
In our CoNLL-2014 shared task submission, we reported that
sparse features help.
Correlation without causation: what actually helped was the
tuning scheme.
Still, we strongly believe that for GEC sparse features are
something worth exploring.
57. Lesson 7: There’s something about sparse-features
But tuning them correctly is hard
In our CoNLL-2014 shared task submission, we reported that
sparse features help.
Correlation without causation: what actually helped was the
tuning scheme.
Still, we strongly believe that for GEC sparse features are
something worth exploring.
Problems:
kbMIRA does not seem to work well with M2
.
Both PRO and kbMIRA give worse results than MERT for M2
even without sparse features.
PRO seems to even out with sparse features.
58. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
59. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
60. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
61. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
62. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
63. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
64. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
subst(comes, surfaces)=1
65. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
subst(comes, surfaces)=1
del(out)=1
66. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1
insert(,)=1
subst(comes, surfaces)=1
del(out)=1
67. Lesson 7: There’s something about sparse-features
Sparse feature examples
Source: Then a new problem comes out .
Target: Hence , a new problem surfaces .
subst(Then,Hence)=1 <s> subst(Then,Hence) a=1
insert(,)=1 Hence insert(,) a=1
subst(comes, surfaces)=1 problem subst(comes, surfaces) out=1
del(out)=1 comes del(out) .=1
68. Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
69. Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
70. Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
Instead: sampled a
subset from the the raw
text data with
cross-entropy difference
filtering.
71. Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
Instead: sampled a
subset from the the raw
text data with
cross-entropy difference
filtering.
Wikipedia CommonCrawl
42.0
44.0
46.0
48.0
50.0
M2
0.5
72. Lesson 8: Nothing beats the language model
Which is actually quite frustrating
Using Kenneth Heafield’s
CommonCrawl LM is
near impossible due to
hardware bottleneck.
Still working on way to
find out the upper
bound.
Instead: sampled a
subset from the the raw
text data with
cross-entropy difference
filtering.
Wikipedia CommonCrawl
42.0
44.0
46.0
48.0
50.0
M2
0.5
73. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
74. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
75. Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
76. Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
77. Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
Legally and morally
dubious.
78. Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
Legally and morally
dubious.
Lang-8 free
Common-
Crawl
Lang-8 scraped
Common-
Crawl
48.0
50.0
52.0
M2
0.5
79. Lesson 9: Collecting parallel monolingual data is hard
For the Shared Task
2014 we scraped more
data from Lang-8.
Collected twice as many
sentences (ca. 4M)
compared to the free
release with 2M .
Legally and morally
dubious.
Lang-8 free
Common-
Crawl
Lang-8 scraped
Common-
Crawl
48.0
50.0
52.0
M2
0.5
80. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
81. Performance on the CoNLL-2014 Shared Task test set
CoNLL-2014
Top 5 Unrestricted
Susanto
et. al 2014
Restricted
Unrestricted
This talk
Restricted
Unrestricted
30.0
40.0
50.0
M2
0.5
82. Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
83. Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
Other wikis: over 400,000 Wikia wikis (Working on it)
The same format as Wikipedia-Dumps.
Our scripts can already work with that.
Wikia has a local branch in Pozna´n!
And one of my students is an intern
84. Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
Other wikis: over 400,000 Wikia wikis (Working on it)
The same format as Wikipedia-Dumps.
Our scripts can already work with that.
Wikia has a local branch in Pozna´n!
And one of my students is an intern
Diff between two Commoncrawl revisions (Dreading it)
Take revisions maybe one year apart.
Group documents by URL.
Sentence align and keep pairs with small differences.
85. Lesson 9: Collecting parallel monolingual data is hard
Other less obvious but legally safe sources
Wikipedia edit histories (Done that)
Over 14 million sentences with small editions.
Publically available as the WikEd corpus.
Currently thinking what to do with that.
Other wikis: over 400,000 Wikia wikis (Working on it)
The same format as Wikipedia-Dumps.
Our scripts can already work with that.
Wikia has a local branch in Pozna´n!
And one of my students is an intern
Diff between two Commoncrawl revisions (Dreading it)
Take revisions maybe one year apart.
Group documents by URL.
Sentence align and keep pairs with small differences.
. . .
92. Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
93. Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
94. Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
With additonal LM and TM data and task-specific features,
Moses beats all unrestricted systems and previously published
system combinations (partially tuned on test data).
95. Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
With additonal LM and TM data and task-specific features,
Moses beats all unrestricted systems and previously published
system combinations (partially tuned on test data).
New top-result with unrestricted data: 50.80% (51.00%
1-composition)
96. Summing up
Proper parameter tuning is crucial.
The language model is the strongest feature.
With proper tuning, vanilla Moses beats published top-results
on the CoNLL-2014 shared task (39.39% vs. 41.63%) when
using restricted training data.
New top-result with restricted data (1-composition): 43.18%
With additonal LM and TM data and task-specific features,
Moses beats all unrestricted systems and previously published
system combinations (partially tuned on test data).
New top-result with unrestricted data: 50.80% (51.00%
1-composition)
But: the M2 metric is highly dubious.
97. Future Work
There is still a lot to learn:
How do I properly tune sparse features with M2 and PRO or
kbMIRA?
Is M2 a sensible metric at all?
Where and how can I collect more data?
What’s the effect of other LMs like neural language models
Can I fully integrate ML methods (Vowpal Wabbit)?
. . .
98. References
Christian Buck, Kenneth Heafield, and Bas van Ooyen. 2014.
N-gram Counts and Language Models from the Common
Crawl. LREC 2014.
Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. 2013.
Building a Large Annotated Corpus of Learner English: The
NUS Corpus of Learner English. BEA 2013.
Daniel Dahlmeier and Hwee Tou Ng. 2012. Better Evaluation
for Grammatical Error Correction. NAACL 2012.
Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian
Hadiwinoto, Raymond Hendy Susanto, and Christopher
Bryant. 2014. The CoNLL-2014 shared task on grammatical
error correction. CoNLL 2014.
99. References
Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2014.
The AMU system in the CoNLL-2014 shared task:
Grammatical error correction by data-intensive and
feature-rich statistical machine translation. CoNLL 2014.
Tomoya Mizumoto, Yuta Hayashibe, Mamoru Komachi,
Masaaki Nagata, and Yuji Matsumoto. 2012. The effect of
learner corpus size in grammatical error correction of ESL
writings. COLING 2012.
Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A.
Smith. 2011. Better hypothesis testing for statistical machine
translation: Controlling for optimizer instability. HLT 2011.
Mauro Cettolo, Nicola Bertoldi, and Marcello Federico. 2011.
Methods for smoothing the optimizer instability in SMT. MT
Summit 2011.