Speaker: Marek Rei, Senior Research Associate, University of Cambridge
Summary: The number of people learning English around the world is currently estimated at 1.5 billion and is predicted to exceed 1.9 billion by 2020. The increasing need to communicate beyond borders has created a large unmet demand for qualified language teachers across the globe. Computational models for error detection and essay scoring can alleviate this issue by giving millions of people access to affordable learning resources. Successful systems for automated language teaching will need to analyse language at various levels of granularity and provide useful feedback to individual students.In this talk, we will explore some of the latest approaches to written language assessment, using neural architectures for composing the meaning of a sentence or text, and also discuss potential future directions in the field.
Grupos de Lie y Curvatura IEMS (Efraín Vega)Efrain Vega
En esta charla veremos que la curvatura puede ser vista a través del Corchete de Lie. Para ello motivaremos los conceptos requeridos como: variedad, grupo, grupo de Lie, álgebra de Lie, corchete de Lie, Conexiones y Curvatura, todos aplicados al caso particular del grupo de rotaciones tridimensional.
Esta presentación tiene 5 slides más que la que del Coloquio de orientación matemática. En ellas se pueden apreciar 3 distribuciones de planos bidimensionales que pueden brindar interpretaciones tanto del corchete de Lie, como del álgebra de Lie de S3 o de la curvaturar de la 2-esfera S2.
Grupos de Lie y Curvatura IEMS (Efraín Vega)Efrain Vega
En esta charla veremos que la curvatura puede ser vista a través del Corchete de Lie. Para ello motivaremos los conceptos requeridos como: variedad, grupo, grupo de Lie, álgebra de Lie, corchete de Lie, Conexiones y Curvatura, todos aplicados al caso particular del grupo de rotaciones tridimensional.
Esta presentación tiene 5 slides más que la que del Coloquio de orientación matemática. En ellas se pueden apreciar 3 distribuciones de planos bidimensionales que pueden brindar interpretaciones tanto del corchete de Lie, como del álgebra de Lie de S3 o de la curvaturar de la 2-esfera S2.
Derivata di una funzione in un punto. Significato geoemtrico di derivata. Equazione della retta tangente al grafico in un punto. Regole di derivazione. Continuità e derivabilità. Punti di non derivabilità.
Funciones proposicionales
Cuantificadores
Cuantificador universal
Cuantificador existencial
Cuantificador existencial de unicidad
Reglas de negación de cuantificadores.
Funciones proposicionales
Con mucha frecuencia hallamos ciertos juicios declarativos que, sin ser proposiciones, están estrechamente relacionados con éstas.
Ejemplo: sea la función proposicional (A, p(x)),donde
A= {-1,0,1,2,3,4} y P(x): x <3><3><3><3><3><3><3 (F)
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Edureka!
** Machine Learning Training with Python: https://www.edureka.co/python **
This Linear Regression Algorithm tutorial is designed in a way that you learn about the algorithm in depth. This tutorial is designed in a way that in the first part you will learn about the algorithm from scratch with its mathematical implementation, then you will drill down to the coding part and implement linear regression using python. Below are the topics covered in this tutorial:
1. What is Regression?
2. Regression Use-case
3. Types of Regression – Linear vs Logistic Regression
4. What is Linear Regression?
5. Finding best-fit regression line using Least Square Method
6. Checking goodness of fit using R squared Method
7. Implementation of Linear Regression Algorithm using Python (from scratch)
8. Implementation of Linear Regression Algorithm using Python (scikit lib)
Check out our playlist for more videos: http://bit.ly/2taym8X
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Derivata di una funzione in un punto. Significato geoemtrico di derivata. Equazione della retta tangente al grafico in un punto. Regole di derivazione. Continuità e derivabilità. Punti di non derivabilità.
Funciones proposicionales
Cuantificadores
Cuantificador universal
Cuantificador existencial
Cuantificador existencial de unicidad
Reglas de negación de cuantificadores.
Funciones proposicionales
Con mucha frecuencia hallamos ciertos juicios declarativos que, sin ser proposiciones, están estrechamente relacionados con éstas.
Ejemplo: sea la función proposicional (A, p(x)),donde
A= {-1,0,1,2,3,4} y P(x): x <3><3><3><3><3><3><3 (F)
Linear Regression Algorithm | Linear Regression in Python | Machine Learning ...Edureka!
** Machine Learning Training with Python: https://www.edureka.co/python **
This Linear Regression Algorithm tutorial is designed in a way that you learn about the algorithm in depth. This tutorial is designed in a way that in the first part you will learn about the algorithm from scratch with its mathematical implementation, then you will drill down to the coding part and implement linear regression using python. Below are the topics covered in this tutorial:
1. What is Regression?
2. Regression Use-case
3. Types of Regression – Linear vs Logistic Regression
4. What is Linear Regression?
5. Finding best-fit regression line using Least Square Method
6. Checking goodness of fit using R squared Method
7. Implementation of Linear Regression Algorithm using Python (from scratch)
8. Implementation of Linear Regression Algorithm using Python (scikit lib)
Check out our playlist for more videos: http://bit.ly/2taym8X
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
Machine translation (MT) was developed as one of the hottest research topics in the natural language processing (NLP) literature. One important issue in MT is that how to evaluate the MT system reasonably and tell us whether the translation system makes an improvement or not. The traditional manual judgment methods are expensive, time-consuming, unrepeatable, and sometimes with low agreement. On the other hand, the popular automatic MT evaluation methods have some weaknesses. Firstly, they tend to perform well on the language pairs with English as the target language, but weak when English is used as source. Secondly, some methods rely on many additional linguistic features to achieve good performance, which makes the metric unable to replicateand apply to other language pairs easily. Thirdly, some popular metrics utilize incomprehensive factors, which result in low performance on some practical tasks.
In this thesis, to address the existing problems, we design novel MT evaluation methods and investigate their performances on different languages. Firstly, we design augmented factors to yield highly accurate evaluation.Secondly, we design a tunable evaluation model where weighting of factors can be optimized according to the characteristics of languages. Thirdly, in the enhanced version of our methods, we design concise linguistic feature using POS to show that our methods can yield even higher performance when using some external linguistic resources. Finally, we introduce the practical performance of our metrics in the ACL-WMT workshop shared tasks, which show that the proposed methods are robust across different languages.
DELAB - sequence generation seminar
Title
Open vocabulary problem
Table of contents
1. Open vocabulary problem
1-1. Open vocabulary problem
1-2. Ignore rare words
1-3. Approximative Softmax
1-4. Back-off Models
1-5. Character-level model
2. Solution1: Byte Pair Encoding(BPE)
3. Solution2: WordPieceModel(WPM)
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
How can text-mining leverage developments in Deep Learning?
Text-mining focusses primary on extracting complex patterns from unstructured electronic data sets and applying machine learning for document classification. During the last decade, a generation of efficient and successful algorithms has been developed using bag-of-words models to represent document content and statistical and geometrical machine learning algorithms such as Conditional Random Fields and Support Vector Machines. These algorithms require relatively little training data and are fast on modern hardware. However, performance seems to be stuck around 90% F1 values.
In computer vision, deep learning has shown great success where the 90% barrier has been broken in many application. In addition, deep learning also shows new successes for transfer learning and self-learning such as reinforcement leaning. Dedicated hardware helped us to overcome computational challenges and methods such as training data augmentation solved the need for unrealistically large data sets.
So, it would make sense to apply deep learning also on textual data as well. But how do we represent textual data: there are many different methods for word embeddings and as many deep learning architectures. Training data augmentation, transfer learning and reinforcement leaning are not fully defined for textual data.
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
Language technology is rapidly evolving. A resurgence in the use of distributed semantic representations and word embeddings, combined with the rise of deep neural networks has led to new approaches and new state of the art results in many natural language processing tasks. One such exciting - and most recent - trend can be seen in multimodal approaches fusing techniques and models of natural language processing (NLP) with that of computer vision.
The talk is aimed at giving an overview of the NLP part of this trend. It will start with giving a short overview of the challenges in creating deep networks for language, as well as what makes for a “good” language models, and the specific requirements of semantic word spaces for multi-modal embeddings.
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Marcin Junczys-Dowmunt
Video recording of full talk: http://lectures.ms.mff.cuni.cz/view.php?rec=255
There has been an increasing interest in using statistical machine translation (SMT) for the task of Grammatical Error Correction (GEC) for English-as-a-Second-Language (ESL) learners. Two of the three highest-scoring systems of the CoNLL-2014 Shared Task were SMT-based. The currently highest-scoring result published for the CoNLL-2014 test set has been achieved by a system combination of the five best CoNLL-2014 submissions built with MEMT (a tool for MT system combination). In this talk, we demonstrate how a single SMT-based system can match and outperform the result of the mentioned combined system. Furthermore, this system outperforms any other published results (including our own CoNLL-2014 submission) for a single system by a margin of several percent F-Score when the same training data is being used. These results are achieved by adapting current state-of-the art methods for phrase-based SMT specifically to the problem of GEC.
We report on the effects of:
- Parameter tuning for GEC
- Introducing GEC-specific dense and sparse features
- Using large-scale data
The NLP muppets revolution! @ Data Science London 2019
video: https://skillsmatter.com/skillscasts/13940-a-deep-dive-into-contextual-word-embeddings-and-understanding-what-nlp-models-learn
event: https://www.meetup.com/Data-Science-London/events/261483332/
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Kotaro Hara
Our talk at CHI2015 in Seoul, South Korea. Find more information at www.kotarohara.com .
YouTube: https://youtu.be/isqsYLkX9gA
Makeability Lab: http://www.cs.umd.edu/~jonf/
Microsoft Research: http://research.microsoft.com/
ABSTRACT
Language barrier is the primary challenge for effective cross-lingual conversations. Spoken language translation (SLT) is perceived as a cost-effective alternative to less affordable human interpreters, but little research has been done on how people interact with such technology. Using a prototype translator application, we performed a formative evaluation to elicit how people interact with the technology and adapt their conversation style. We conducted two sets of studies with a total of 23 pairs (46 participants). Participants worked on storytelling tasks to simulate natural conversations with 3 different interface settings. Our findings show that collocutors naturally adapt their style of speech production and comprehension to compensate for inadequacies in SLT. We conclude the paper with the design guidelines that emerged from the analysis.
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEkevig
Grammatical error correction (GEC) greatly benefits from large quantities of high-quality training data.
However, the preparation of a large amount of labelled training data is time-consuming and prone to
human errors. These issues have become major obstacles in training GEC systems. Recently, the
performance of English GEC systems has drastically been enhanced by the application of deep neural
networks that generate a large amount of synthetic data from limited samples. While GEC has extensively
been studied in languages such as English and Chinese, no attempts have been made to generate synthetic
data for improving Persian GEC systems. Given the substantial grammatical and semantic differences of
the Persian language, in this paper, we propose a new deep learning framework to create large enough
synthetic sentences that are grammatically incorrect for training Persian GEC systems. A modified version
of sequence generative adversarial net with policy gradient is developed, in which the size of the model is
scaled down and the hyperparameters are tuned. The generator is trained in an adversarial framework on
a limited dataset of 8000 samples. Our proposed adversarial framework achieved bilingual evaluation
understudy (BLEU) scores of 64.5% on BLEU-2, 44.2% on BLEU-3, and 21.4% on BLEU-4, and
outperformed the conventional supervised-trained long short-term memory using maximum likelihood
estimation and recently proposed sequence labeler using neural machine translation augmentation. This
shows promise toward improving the performance of GEC systems by generating a large amount of
training data.
ADVERSARIAL GRAMMATICAL ERROR GENERATION: APPLICATION TO PERSIAN LANGUAGEkevig
Grammatical error correction (GEC) greatly benefits from large quantities of high-quality training data.
However, the preparation of a large amount of labelled training data is time-consuming and prone to
human errors. These issues have become major obstacles in training GEC systems. Recently, the
performance of English GEC systems has drastically been enhanced by the application of deep neural
networks that generate a large amount of synthetic data from limited samples. While GEC has extensively
been studied in languages such as English and Chinese, no attempts have been made to generate synthetic
data for improving Persian GEC systems. Given the substantial grammatical and semantic differences of
the Persian language, in this paper, we propose a new deep learning framework to create large enough
synthetic sentences that are grammatically incorrect for training Persian GEC systems. A modified version
of sequence generative adversarial net with policy gradient is developed, in which the size of the model is
scaled down and the hyperparameters are tuned. The generator is trained in an adversarial framework on
a limited dataset of 8000 samples. Our proposed adversarial framework achieved bilingual evaluation
understudy (BLEU) scores of 64.5% on BLEU-2, 44.2% on BLEU-3, and 21.4% on BLEU-4, and
outperformed the conventional supervised-trained long short-term memory using maximum likelihood
estimation and recently proposed sequence labeler using neural machine translation augmentation. This
shows promise toward improving the performance of GEC systems by generating a large amount of
training data.
This is presentation slides of the paper "Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech" in 5th International Conference on Statistical Language and Speech Processing (SLSP 2017)
Abstract
Until very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.
Speaker: Vitalii Braslavskyi, Software Engineer at Grammarly
Summary:
Today, the dominant approach to software engineering is an imperative one — the best practices have been proven over time. But the world is always evolving, and in order to evolve with it and remain as productive as possible, we need to continue searching for better tools to solve problems of increasing complexity.
In this talk, we'll discuss the tools and techniques of the .Net ecosystem that can help us to concentrate on the problem itself — not just on the intermediate steps (which have likely already been solved). We'll compare imperative and declarative approaches and assess solutions to problems.
We'll also offer examples of how engineers in Grammarly's Office Add-in team use these tools to improve the efficiency of our engineering and strengthen our solutions to the problems at hand.
Grammarly AI-NLP Club #10 - Information-Theoretic Probing with Minimum Descri...Grammarly
Speaker: Elena Voita, a Ph.D. student at the University of Edinburgh and the University of Amsterdam
Summary: How can you know whether a model (e.g., ELMo, BERT) has learned to encode a linguistic property? The most popular approach to measure how well pretrained representations encode a linguistic property is to use the accuracy of a probing classifier (probe). However, such probes often fail to adequately reflect differences in representations, and they can show different results depending on probe hyperparameters. As an alternative to standard probing, we propose information-theoretic probing which measures minimum description length (MDL) of labels given representations. In addition to probe quality, the description length evaluates “the amount of effort” needed to achieve this quality. We show that (i) MDL can be easily evaluated on top of standard probe-training pipelines, and (ii) compared to standard probes, the results of MDL probing are more informative, stable, and sensible.
Grammarly AI-NLP Club #9 - Dumpster diving for parallel corpora with efficien...Grammarly
Speaker: Kenneth Heafield, Lecturer at the University of Edinburgh
Summary: The ParaCrawl project is mining a petabyte of the web for translations to release freely at https://paracrawl.eu/releases.html. But the web is a messy place, with a lot of data to sift through. To find translations, we translate everything into English or at least use a neural encoder. A related project makes machine translation inference more efficient by using optimizations ranging from assembly instructions to removal of bits of model architecture.
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly
Speaker: Nizar Habash is an Associate Professor of Computer Science at New York University Abu Dhabi (NYUAD). Professor Habash’s research includes extensive work on machine translation, morphological analysis, and computational modeling of Arabic and its dialects. Professor Habash has been a principal investigator or co-investigator on over 20 grants. He has over 200 publications including a book titled “Introduction to Arabic Natural Language Processing.” His website is www.nizarhabash.com. He is the director of the NYUAD Computational Approaches to Modeling Language (CAMeL) Lab (www.camel-lab.com).
Summary: The Arabic language presents a number of challenges to researchers and developers of language technologies. Arabic is both morphologically rich and highly ambiguous; and it has a number of dialects that vary widely amongst themselves and with Standard Arabic. The dialects have no official spelling standards, and spelling and grammar errors are common in unedited Standard Arabic. In this talk, we present some of these challenges in detail and cover some of the ongoing efforts to address them with creative language technologies.
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly
Speaker: Artem Chernodub, Chief Scientist at Clikque Technology and Associate Professor at Ukrainian Catholic University
Summary: Sequence Tagging is an important NLP problem that has several applications, including Named Entity Recognition, Part-of-Speech Tagging, and Argument Component Detection. In our talk, we will focus on a BiLSTM+CNN+CRF model — one of the most popular and efficient neural network-based models for tagging. We will discuss task decomposition for this model, explore the internal design of its components, and provide the ablation study for them on the well-known NER 2003 shared task dataset.
Grammarly AI-NLP Club #5 - Automatic text simplification in the biomedical do...Grammarly
Speaker: Natalia Grabar, NLP scientist
Summary: We propose a set of experiments with the general objective of ensuring a better understanding of technical health documents. Various experiments address different steps of this complex and ambitious process: (1) categorization of documents according to their complexity; (2) detection of complex passages within documents; (3) acquisition of resources for the lexical and semantic simplification of documents; (4) alignment of parallel sentences from comparable corpora for generating rules for syntactic transformation. According to the steps and tasks, various methods are exploited (rule-based, machine learning, with and without linguistic knowledge). In addition to text simplification, the results and resources can be used for other NLP applications and tasks (e.g., information retrieval and extraction, question-answering, textual entailment).
Grammarly AI-NLP Club #3 - Learning to Read for Automated Fact Checking - Isa...Grammarly
Speaker: Isabelle Augenstein, Assistant Professor, University of Copenhagen
Summary: The spread of misinformation and disinformation is growing, and it’s having a big impact on interpersonal communications, politics and even science.
Traditional methods, e.g., manual fact-checking by reporters, cannot keep up with the growth of information. On the other hand, there has been much progress in natural language processing recently, partly due to the resurgence of neural methods.
How can natural language processing methods fill this gap and help to automatically check facts?
This talk will explore different ways to frame fact checking and detail our ongoing work on learning to encode documents for automated fact checking, as well as describe future challenges.
Grammarly Meetup: DevOps at Grammarly: Scaling 100xGrammarly
Speaker: Dmitry Unkovsky, Software Engineer at Grammarly
Summary: We will tell the story of DevOps at Grammarly since 2013. We’ll talk about how we managed infrastructure growth while keeping up with the rapid pace of product development; what worked for us and what did not, and why; and what it’s like to make technical choices as an engineer at our company. We will share our current vision and future plans.
Grammarly Meetup: Memory Networks for Question Answering on Tabular Data - Sv...Grammarly
Tabular data is difficult to analyze and search through. There is a clear need for new tools and interfaces that would allow even non-tech-savvy users to gain insights from open datasets without resorting to specialized data analysis tools or even having to fully understand the dataset structure. We explore the End-To-End Memory Networks architecture (Sukhbaatar et al., 2015) in application to answering natural language questions from tabular data. This architecture was originally designed for the question-answering tasks from short natural language texts (bAbI tasks) (Weston et al., 2015), which include testing elements of inductive and deductive reasoning, co-reference resolution and time manipulation.
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly
Speaker: Jordi Carrera Ventura, Artificial Intelligence technologist at Telefónica R&D
Summary: Chatbots (aka conversational agents, spoken dialogue systems) allow users to interface with computers using natural language by simply asking questions or issuing commands.
Given a query, the chatbot builds a semantic representation of the input, transforms it into a logical statement, and performs all the necessary actions to fulfill the user's intent. Sometimes this simply means calculating an exact answer or retrieving a fact from a database, whereas other times it means building a contextual model and running a full-fledged conversation flow while keeping track of anaphoras and cross-references.
Besides the direct applications of chatbots in IoT (Amazon’s Alexa, Apple's Siri) and IT (the historical field of Information Retrieval as a whole can be seen as a sub-problem of spoken dialogue systems), chatbots' main appeal for technologists is their location at the intersection of all major Natural Language Processing technologies and many of the deepest questions in Cognitive Science today: semantic parsing, entity recognition, knowledge representation, and coreference resolution.
In this talk, I will explore those questions in the context of an applied industry setting, and I will introduce a framework suitable for addressing them, together with an overview of the state-of-the-art in chatbot technology and some original techniques.
Grammarly AI-NLP Club #1 - Domain and Social Bias in NLP: Case Study in Langu...Grammarly
Speaker: Tim Baldwin, Professor of Computer Science, University of Melbourne
Summary: Two forms of bias that are commonly associated with natural language processing (NLP) tasks are domain bias (implicit bias towards documents from a particular domain, with lower performance over other document types) and social bias (implicit bias towards documents authored by particular types of individuals, with lower performance over documents authored by other types of individuals). In this talk, I will discuss the importance of debiasing NLP models across these dimensions, and strategies that can be employed to achieve this. I will focus the talk on the task of language identification (i.e., identifying the language(s) a written document is authored in).
Speaker: Andriy Gryshchuk, Senior Research Engineer at Grammarly.
Summary: Paraphrase detection is a challenging NLP task since it requires both thorough syntactic and thorough semantic analysis to identify whether two phrases have the same intent. A few months ago, paraphrase identification became an objective of one of the most popular Kaggle competitions, Quora Question Pairs. In this talk, Yuriy Guts and Andriy Gryshchuk, silver medalists of the competition, will share their arsenal of statistical, linguistic, and Deep Learning approaches that helped them succeed in this challenge.
Speaker: Yuriy Guts, Machine Learning Engineer at DataRobot.
Paraphrase detection is a challenging NLP task since it requires both thorough syntactic and thorough semantic analysis to identify whether two phrases have the same intent. A few months ago, paraphrase identification became an objective of one of the most popular Kaggle competitions, Quora Question Pairs. In this talk, Yuriy Guts and Andriy Gryshchuk, silver medalists of the competition, will share their arsenal of statistical, linguistic, and Deep Learning approaches that helped them succeed in this challenge.
Natural Language Processing for biomedical text mining - Thierry HamonGrammarly
Speaker: Thierry Hamon, Associate Professor in Computer Science at Université Paris, Member of the LIMSI-CNRS research lab.
Summary: Among the large amounts of unstructured data generated across the world and available nowadays, textual data represent an important source of information. This fact is particularly true in the biomedical domain, where a constant increasing demand to access the textual content is observed: the situation is relevant for accessing and processing Electronic Health Records, online discussion forums, and scientific literature. Indeed, dealing with biomedical texts requires us to take into account a great variety of texts, languages and Users.
For several years now, a lot of NLP research has focused on mining and retrieving information (i.e., medical entities and domain-specific relations), which are relevant for biologists, physicians, terminologists, epidemiologists, and patients. We will propose an overview of the NLP methods used for tackling several such research problems through text mining applications. First, we will present the resources and rule-based approaches we designed for extracting drug-related information from clinical texts, and for acquiring domain-specific semantic relations from digital libraries. Then we will present the cross-lingual approach we are developing for building multilingual terminologies from a patient-centered Ukrainian corpus.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
2. 2
Automated Language Assessment
The number of people learning English around the world is currently
estimated at 1.5 billion and is predicted to exceed 1.9 billion by 2020.
Advantages for students:
• Immediate grades and feedback
• Enables self-assessment and self-tutoring
• Constant availability as an online tool
Advantages for teachers/examiners:
• Reduced teacher/examiner workload
• Can focus on more interesting or difficult content
• Cost-effective approach to assessment
3. 3
Automated Language Assessment
Dear Mrs Brown,
I am writing you because my class want to give a
surprise birthday party for your husband Mr Brown. We
need your help for the details.
First of all could you let us know if the date of June 16th
is all right with his timetable program. We have
organised to do the party between three to six o'clock in
afternoon in College Canteen, about food we organised
a buffet, but could you also help us with the music which
he prefer, if prefer something especialy. We have invite
the student, the teachers and the Principal of school but
we appreciate if you are coming. At last would you tell
us which is the best present for him a compact disk or a
book .
We want say thanks again for your help and you must
be sure that your opinion it would be valuable to us.
I am looking forward to receiving your answer and don't
forget that it is a surprice birthday party.
Yours faithfuly,
Tom
Evaluation:
● Detect any writing errors
● Calculate a holistic writing score
● Predict language proficiency score
(IELTS, FCE)
● Detailed analytic scores (e.g.,
coherence, topic relevance)
Guidance:
● Show detailed progress reports
● Provide corrections for errors
● Suggest areas to focus on
● Generate suitable exercises
4. 4
Talk Overview
Error Detection
Identifying the locations of grammatical errors
01
Error Correction
Providing an edited version of an incorrect sentence
02
Applications and Future Directions
How do we make this useful and where do we go next
04
Essay Scoring
Estimating a language proficiency score based on the full text
03
9. 9
I want to thak you for preparing such a nice evening .
Error Detection in Learner Writing
Spelling error (8.6%)
I know how to cook some things like potatoes .
Missing punctuation (7.4%)
If you have time , why don’t you meet up .
Incorrect punctuation (7.1%)
I’m looking forward to seeing you and good luck to your project .
Incorrect preposition (6.3%)
My friend eats two ice creams yesterday .
Verb tense error (6.0%)
10. 10
We can invite also people who are not members .
Error Detection in Learner Writing
Word order error (2.8%)
The main material that have been used is dark green glass .
Verb agreement error (1.6%)
I thing you should better save your money .
Spelling error produces a valid word (1.5%)
And at last but not the least , Captain Davidson showed him ...
Incorrectly reproduced idiom (0.5%)
Specially the old castle Wawel's great .
Complex error (0.5%)
11. 11
Automated Error Detection
1. Experts have hand-annotated a
large dataset of learner essays,
marking the location of each error.
2. We create algorithms that can look
at all these examples and discover
regularities through machine
learning.
3. We apply the resulting models on
new data, where they are able to
provide predictions.
12. 12
Deep Learning and Neural Networks
• Highly-connected networks of
parameters
• Randomly initialised, but optimised for a
specific task during training
• Automatically discovering features that
are useful for the task
• Each layer is a function of the previous
layer
• Have achieved state-of-the-art results on
nearly all language processing tasks
13. 13
Neural Error Detection
Marek Rei and Helen Yannakoudakis (2016) Compositional Sequence Labeling Models for Error Detection
in Learner Writing. ACL 2016.
• Composing words into
context-specific
representations.
• Predicting a probability
distribution over all the
possible labels for each
word.
14. 14
System FCE CoNLL14-1 CoNLL14-2
BiLSTM 41.10 16.40 23.90
Neural Error Detection
First Certificate in English dataset (FCE, Yannakoudakis et al. (2011))
● 1,141 manually annotated essays, containing 450K words
● Written by learners during language examinations
● In response to prompts eliciting free-text answers
● Publicly available dataset
Evaluating error detection using F0.5
15. 15
Additional Training Data
System FCE CoNLL14-1 CoNLL14-2
Public FCE 41.10 16.40 23.90
Private CLC 64.30 34.30 44.00
More data = better performance
We can generate artificial data:
Additional training examples for error detection
Idea 1: Randomly generate errors in correct text
16. 16
Pattern-based Error Generation
Idea 2: Extract known error patterns and insert them into correct text
We went shop on Saturday
We went shopping on Saturday
VVD shop_VV0 II => VVD shopping_VVG II
I was shopping on Monday
I was shop on Monday
Marek Rei, Mariano Felice, Zheng Yuan and Ted Briscoe (2017) Artificial Error Generation with Machine Translation and
Syntactic Patterns. BEA 2017.
17. 17
Translation-based Error Generation
Idea 3: Train a machine translation model to translate from correct to
incorrect text
ORIG: We are a well-mixed class with equal numbers of boys and girls, all about 20 years old.
PAT: We are a well-mixed class with equal numbers of boys an girls, all about 20 year old.
MT: We are a well-mixed class with equals numbers of boys and girls, all about 20 years old.
Normally translate between languages:
E.g. English to French
Now let’s translate for generating errors:
English to faulty English
Can use off-the-shelf machine translation tools
Marek Rei, Mariano Felice, Zheng Yuan and Ted Briscoe (2017) Artificial Error Generation with Machine Translation and
Syntactic Patterns. BEA 2017.
18. 18
System FCE CoNLL14-1 CoNLL14-2
BiLSTM 41.10 16.40 23.90
+PAT 47.81 19.47 28.49
+MT 48.37 19.73 28.39
+PAT+MT 49.11 21.87 30.13
Artificial Error Generation
Training on 450K words of annotated data and 4.5M words of automatically
generated data.
20. 20
Error Correction
Error detection identifies incorrect words
Error correction modifies a sentence to remove errors
We can formulate correction as a machine translation problem:
Let’s translate from incorrect English to correct English
Returns the highest scoring possible translation
Input: We can invite also people who are not members .
Output: We can also invite people who are not members .
21. 21
Statistical Machine Translation
Text is separated into multi-word units (phrases)
Phrase alignments and translation tables are learned from parallel
datasets
Language models are used to ensure reasonable output
22. 22
Neural Machine Translation
The encoder learns to process the source sentence and produce an
informative vector representation
The decoder learns to generate a sentence in a different language based
on that vector
Bahdanau et al. (2014), figure by Stephen Merity.
23. 23
Input: I aren’t seen Albert since last summer .
Output: I haven’t seen OOV since last summer .
Handling Unknown Words
Neural models have a limited fixed vocabulary and represent other words
as OOV tokens.
Solution:
1) Align the words between the input and output text
2) Translate OOV words in a post-processing step
Zheng Yuan and Ted Briscoe (2016) Grammatical error correction using neural machine translation. NAACL 2016.
25. 25
Original sentence:
There are some informations you have asked me about.
SMT output:
1st There are some information you have asked me about.
2nd There is some information you have asked me about.
3rd There are some information you asked me about.
4th There are some information you have asked me.
5th There are some information you have asked me for.
N-best List
26. 26
The correction system may not know how to fix an error, therefore leave
it uncorrected.
How can we use the detection model to fix this problem and assign a
better score to each “translation”?
+ + + + + + - -
The theatre restaurant was closed for unknown reason
Scoring Candidates
27. 27
How can we use the detection model to fix this problem and assign a
better score to each “translation”?
1.0 1.0 1.0 0.9 1.0 1.0 0.3 0.1
The theatre restaurant was closed for unknown reason
Scoring Candidates
1. Sentence correctness score: calculated based on the probability of
each of its tokens being correct.
2. Correction recall score: select the translation that has modified the
(maximum number of) words marked by the detection model as
incorrect.
3. Correction agreement score: the ratio of agreed corrections compared
to the disagreed corrections.
Helen Yannakoudakis, Marek Rei, Øistein E. Andersen and Zheng Yuan (2017) Neural Sequence-Labelling Models for
Grammatical Error Correction. EMNLP 2017.
29. 29
Original sentence:
I work with children an the Computer help my Jop
bat affeted to
MT output:
I work with children and the Computer help my Jop
bat affeted to
MT+detection output:
I work with children and the computer helps my Jop
bat affeted to
Error Correction Results
30. 30
Original sentence:
It takes 25 minutes that is convenient to us
MT output:
It takes 25 minutes that is convenient for us
MT+detection output:
It takes 25 minutes , which is convenient for us
Error Correction Results
31. 31
Original sentence:
I hope that our friend Richard Brown doesn’t have
any serious willness
MT output:
I hope that our friend Richard Brown doesn’t have
any serious willness
MT+detection output:
I hope that our friend Richard Brown doesn’t have
any serious willingness
Error Correction Results
34. 34
Feature-based Essay Scoring
Extract a number of features:
● Word sequences
○ Unigrams
○ Bigrams
○ Trigrams
● Part-of-speech tags
● Grammatical
constructions
● Complexity measures
● Semantic similarity
between sentences
● Estimated error count
Helen Yannakoudakis, Ted Briscoe and Ben Medlock (2011) A New Dataset and Method for Automatically Grading ESOL
Texts. ACL 2011.
37. 37
Score-specific Word Embeddings
Optimising word embeddings to:
1) differentiate between correct
and randomly corrupted
sequences
2) predict the score of the
essay where the current
word sequence came from
Then use these embeddings in a
neural network for essay scoring.
Dimitrios Alikaniotis, Helen Yannakoudakis and Marek Rei (2016) Automatic Text Scoring Using Neural Networks.
ACL 2016.
38. 38
Score-specific Word Embeddings
Pre-training Spearman (⍴) % RMSE
None 68 7.31
word2vec 79 3.2
SSWE 91 2.4
Evaluating score-specific word embeddings on the ASAP dataset: 13K marked
essays (150-550 words each).
Using a two-layer bi-directional LSTM for essay scoring.
39. 39
Error-specific Word Embeddings
Taking advantage of the available
error annotation in the training
data.
Optimising embeddings to detect
real errors, as opposed to
randomly corrupted sequences.
Network predicts the quality of
each word sequence, based on
the number of errors it contains.
Youmna Farag, Marek Rei and Ted Briscoe (2017) An Error-Oriented Approach to Word Embedding Pre-Training.
BEA 2017.
40. 40
Pre-training Spearman (⍴) % RMSE
word2vec 56.7 4.9
Glove 51.8 5.2
SSWE 58.3 4.9
ESWE 63.7 4.5
Error-specific Word Embeddings
Evaluating error-specific word embeddings on the FCE dataset.
Using the convolutional network for essay scoring.
43. 43
Future Directions
Specialised systems
Supervised models
targeting specific error
types
Multi-task learning
Taking better advantage
of other tasks and
datasets
Multi-modal topics
Students writing about
images or videos
44. 44
Summary
Error detection
Neural sequence labelling architecture
Artificial data generation
01
Error correction
Neural machine translation
Reranking with detection
02
Essay scoring
Feature-based model
Neural essay scoring
Score-specific word embeddings
03
BE THE BEST MARKETING COMPANY