Offline Character Recognition Using Monte Carlo Method and Neural Networkijaia
Human Machine interface are constantly gaining improvements because of increasing development of
computer tools. Handwritten Character Recognition do have various significant applications like form
scanning, verification, validation, or checks reading. Because of the importance of these applications
passionate research in the field of Off-Line handwritten character recognition is going on. The challenge in
recognising the handwritings lies in the nature of humans, having unique styles in terms of font, contours,
etc. This paper presents a novice approach to identify the offline characters; we call it as character divider
approach which can be used after pre-processing stage. We devise an innovative approach for feature
extraction known as vector contour. We also discuss the pros and cons including limitations, of our
approach
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 169번째 논문 review입니다.
이번에 살펴본 논문은 Google에서 발표한 EfficientNet입니다. efficient neural network은 보통 mobile과 같은 제한된 computing power를 가진 edge device를 위한 작은 network 위주로 연구되어왔는데, 이 논문은 성능을 높이기 위해서 일반적으로 network를 점점 더 키워나가는 경우가 많은데, 이 때 어떻게 하면 더 효율적인 방법으로 network을 키울 수 있을지에 대해서 연구한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1905.11946
영상링크: https://youtu.be/Vhz0quyvR7I
Offline Character Recognition Using Monte Carlo Method and Neural Networkijaia
Human Machine interface are constantly gaining improvements because of increasing development of
computer tools. Handwritten Character Recognition do have various significant applications like form
scanning, verification, validation, or checks reading. Because of the importance of these applications
passionate research in the field of Off-Line handwritten character recognition is going on. The challenge in
recognising the handwritings lies in the nature of humans, having unique styles in terms of font, contours,
etc. This paper presents a novice approach to identify the offline characters; we call it as character divider
approach which can be used after pre-processing stage. We devise an innovative approach for feature
extraction known as vector contour. We also discuss the pros and cons including limitations, of our
approach
Explores the type of structure learned by Convolutional Neural Networks, the applications where they're most valuable and a number of appropriate mental models for understanding deep learning.
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
TensorFlow-KR 논문읽기모임 PR12 169번째 논문 review입니다.
이번에 살펴본 논문은 Google에서 발표한 EfficientNet입니다. efficient neural network은 보통 mobile과 같은 제한된 computing power를 가진 edge device를 위한 작은 network 위주로 연구되어왔는데, 이 논문은 성능을 높이기 위해서 일반적으로 network를 점점 더 키워나가는 경우가 많은데, 이 때 어떻게 하면 더 효율적인 방법으로 network을 키울 수 있을지에 대해서 연구한 논문입니다. 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1905.11946
영상링크: https://youtu.be/Vhz0quyvR7I
In machine learning, a convolutional neural network is a class of deep, feed-forward artificial neural networks that have successfully been applied fpr analyzing visual imagery.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Поговорим об одной из базовых практических техник обучения нейронных сетей - предобучение, finetuning, transfer learning. В каких случаях применять, какие модели использовать, где их брать и как адаптировать.
HardNet: Convolutional Network for Local Image DescriptionDmytro Mishkin
We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.
Convolutional neural network (CNN / ConvNet's) is a part of Computer Vision. Machine Learning Algorithm. Image Classification, Image Detection, Digit Recognition, and many more. https://technoelearn.com .
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
Deep learning lecture - part 1 (basics, CNN)SungminYou
This presentation is a lecture with the Deep Learning book. (Bengio, Yoshua, Ian Goodfellow, and Aaron Courville. MIT press, 2017) It contains the basics of deep learning and theories about the convolutional neural network.
In machine learning, a convolutional neural network is a class of deep, feed-forward artificial neural networks that have successfully been applied fpr analyzing visual imagery.
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
Recently, Convolutional Neural Networks have been successfully applied to image segmentation tasks. Here we present some of the most recent techniques that increased the accuracy in such tasks. First we describe the Inception architecture and its evolution, which allowed to increase width and depth of the network without increasing the computational burden. We then show how to adapt classification networks into fully convolutional networks, able to perform pixel-wise classification for segmentation tasks. We finally introduce the hypercolumn technique to further improve state-of-the-art on various fine-grained localization tasks.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Поговорим об одной из базовых практических техник обучения нейронных сетей - предобучение, finetuning, transfer learning. В каких случаях применять, какие модели использовать, где их брать и как адаптировать.
HardNet: Convolutional Network for Local Image DescriptionDmytro Mishkin
We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.
Convolutional neural network (CNN / ConvNet's) is a part of Computer Vision. Machine Learning Algorithm. Image Classification, Image Detection, Digit Recognition, and many more. https://technoelearn.com .
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
Deep learning lecture - part 1 (basics, CNN)SungminYou
This presentation is a lecture with the Deep Learning book. (Bengio, Yoshua, Ian Goodfellow, and Aaron Courville. MIT press, 2017) It contains the basics of deep learning and theories about the convolutional neural network.
This paper presents machine translation based on machine learning, which learns the semantically
correct corpus. The machine learning process based on Quantum Neural Network (QNN) is used to
recognizing the corpus pattern in realistic way. It translates on the basis of knowledge gained during
learning by entering pair of sentences from source to target language. By taking help of this training data
it translates the given text. own text.The paper consist study of a machine translation system which
converts source language to target language using quantum neural network. Rather than comparing
words semantically QNN compares numerical tags which is faster and accurate. In this tagger tags the
part of sentences discretely which helps to map bilingual sentences.
Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu
비지도 학습 기반의 기계 번역은 균형 잡힌 양 언어 데이터가 없는 경우에도 높은 성능을 보였지만, 데이터가 부족한 영역에서는 여전히 문제가 있습니다. 이 문제를 해결하기 위해, 본 논문에서는 소량의 학습 데이터만을 사용하여 다른 도메인에 적응하는 비지도 신경 기계 번역(UNMT) 모델을 훈련시키는 새로운 메타러닝 알고리즘을 제안합니다. 데이터가 부족한 도메인 처리에 도메인 일반 지식이 중요하다고 가정하며, 높은 자원의 도메인에서 얻은 지식을 활용하는 메타러닝 알고리즘을 확장하여 저자원 UNMT의 성능을 향상시킵니다. 우리의 모델은 전이 학습 기반 접근 방식보다 최대 2-4 BLEU 점수로 뛰어납니다. 광범위한 실험 결과는 제안된 알고리즘이 빠른 적응에 적합하고 다른 기준 모델들보다 지속적으로 우수한 성능을 보여줍니다.
Our project is about guessing the correct missing
word in a given sentence. To find of guess the missing word
we have two main methods one of them statistical language
modeling, while the other is neural language models.
Statistical language modeling depend on the frequency of the
relation between words and here we use Markov chain. Since
neural language models uses artificial neural networks which
uses deep learning, here we use BERT which is the state of art
in language modeling provided by google.
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняLviv Startup Club
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
AI & BigData Online Day 2023 Spring
Website - www.aiconf.com.ua
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
ODSC East: Effective Transfer Learning for NLPindico data
Presented by indico co-founder Madison May at ODSC East.
Abstract: Transfer learning, the practice of applying knowledge gained on one machine learning task to aid the solution of a second task, has seen historic success in the field of computer vision. The output representations of generic image classification models trained on ImageNet have been leveraged to build models that detect the presence of custom objects in natural images. Image classification tasks that would typically require hundreds of thousands of images can be tackled with mere dozens of training examples per class thanks to the use of these pretrained reprsentations. The field of natural language processing, however, has seen more limited gains from transfer learning, with most approaches limited to the use of pretrained word representations. In this talk, we explore parameter and data efficient mechanisms for transfer learning on text, and show practical improvements on real-world tasks. In addition, we demo the use of Enso, a newly open-sourced library designed to simplify benchmarking of transfer learning methods on a variety of target tasks. Enso provides tools for the fair comparison of varied feature representations and target task models as the amount of training data made available to the target model is incrementally increased.
short story presentation on Framework for Deep Learning-Based Language Models Using Multi-Task Learning in Natural Language Understanding: A Systematic Literature Review and Future Directions
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
A brief survey of current deep learning/neural network methods currently used in NLP: recurrent networks (LSTM, GRU), recursive networks, convolutional networks, hybrid architectures, attention models. We will look at specific papers in the literature, targeting sentiment analysis, text classification and other tasks.
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...Jinho Choi
This paper presents a new corpus and a robust deep learning architecture for a task in reading comprehension, passage completion, on multiparty dialog. Given a dialog in text and a passage containing factual descriptions about the dialog where mentions of the characters are replaced by blanks, the task is to fill the blanks with the most appropriate character names that reflect the contexts in the dialog. Since there is no dataset that challenges the task of passage completion in this genre, we create a corpus by selecting transcripts from a TV show that comprise 1,682 dialogs, generating passages for each dialog through crowdsourcing, and annotating mentions of characters in both the dialog and the passages. Given this dataset, we build a deep neural model that integrates rich feature extraction from convolutional neural networks into sequence modeling in recurrent neural networks, optimized by utterance and dialog level attentions. Our model outperforms the previous state-of-the-art model on this task in a different genre using bidirectional LSTM, show- ing a 13.0+% improvement for longer dialogs. Our analysis shows the effectiveness of the attention mechanisms and suggests a direction to machine comprehension on multiparty dialog.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
1. Transfer Learning in NLP:
A Survey
Nupur Yadav
San Jose State University
Emerging Technologies in ML
2. Introduction
• The limitations of deep learning models, such as requiring a
large amount of data to train models and demand for huge
computing resources, forces research for the knowledge
transfer possibilities.
• Many large DL models are emerging that demand the need
for transfer learning.
3. Models used for NLP
1. Recurrent-Based Models
2. Attention-Based Models
3. CNN-Based Models
4. Recurrent-Based Models
• In RNNs we pass the previous model state along with each input to make them learn
the sequential context. Works great for many tasks such as speech recognition,
translation, text generation, time-series classification, and biological modeling.
• Suffers from the problem of vanishing gradient because of using backpropagation and
its sequential nature.
• To overcome this issue many ideas came such as using Rectified Linear Unit (ReLU)
as the activation function, then Long Short-Term Memory (LSTM) architecture,
bidirectional LSTMs, Gated Recurrent Networks (GRUs).
• GRUs are the fastest versions of LSTMs and can beat LSTMs in some tasks such as
automatic capturing the grammatical properties of the input sentences.
5. Attention-Based Models
• RNNs aggregates the sequence activations in one vector which causes
the learning process to forget about words that were fed in the past.
• Attention-based models on the other hand attend each word differently
to inputs based on the similarity score.
• Attention can be applied between different sequences or in the same
sequence which is called self-attention.
6. CNN-Based Models
• It uses convolutional and max-pooling layers for sub-sampling.
Convolutional layers extract features and pooling layers reduce the
spatial size of the extracted features.
• In NLP, CNNs have been successfully used for sentence classification
tasks such as movie reviews, question classification, etc.
• Also used in language modeling where gated convolutional layers were
used to preserve larger contexts and can be parallelized compared to
traditional recurrent neural networks.
7. Language Models
1. Unidirectional LM: Consider tokens either that are to the left of the current context
or to the right.
2. Bidirectional LM: Each token can attend to any other token in the current context.
3. Masked LM: Used in bidirectional LM where we randomly mask some tokens in the
current context and then predict these masked tokens.
4. Sequence-to-sequence LM: involves splitting the input into two separate parts. In
the first part, every token can see the context of any other token in that part but in
the second part, every token can only attend to tokens to the left.
5. Permutation LM: This language model combines the benefits of both auto-
regressive and auto-encoding.
6. Encoder-Decoder based LM: In comparison to other approaches that use a single
stack of encoder/decoder blocks, this approach uses both the blocks.
8. Transfer Learning
• Given a source domain-task tuple (Ds, Ts) and a different
target domain-task tuple (Dt , Tt ), transfer learning can be
defined as the process of using the source domain and task
in the learning process of the target domain task.
• Mathematically, the objective of transfer learning is to learn
the target conditional probability distribution P(Yt|Xt) in Dt
with the information gained from Ds and where Ds ≠Dt or Ts
≠ Tt.
9. Types of Transfer Learning
Transductive Transfer Learning: Transductive
Transfer Learning is when for the same task, the
target domain or task doesn’t have labeled data or
has very few labeled samples.
It can further be divided into the following sub-
categories:
• Domain Adaptation
• Cross-lingual transfer learning
10. Domain Adaptation
• Process of adapting to a new domain. It usually happens when we want to learn a different data
distribution in the target domain.
• Useful if the new task to train-on has a different distribution or the amount of labeled data is
scarce.
• One of the recent work involves using adversarial domain adaptation for the detection of
duplicate questions. This approach had three main components: an encoder, a similarity
function, and a domain adaptation module.
• The encoder encoded the question and was optimized to fool the domain classifier that the
question was from the target domain.
• The similarity function calculated the probability for a pair of questions to find they were similar
or duplicate.
• And the domain adaptation component was used to decrease the difference between the target
and source domain distributions.
• This approach proved better and achieved an average improvement of around 5.6% over the
best benchmark for different pairs of domains.
11. Cross-lingual transfer learning
• This involves adapting to a different language in the target domain.
• Useful when we want to use a high-resource language to learn corresponding tasks in
a low-resource language.
• One of the recent work involves using a new dataset to evaluate three different cross-
lingual transfer methods on the task of user intent classification and time slot detection.
• The dataset contained 57k annotated utterances in English, Thai, and Spanish and
was categorized into three domains which were reminder, weather, and alarm.
• The three cross-lingual transfer methods used were translating the training data, using
cross-lingual pre-trained embeddings, and novel methods of using multilingual
machine translation encoders as contextual word representations.
• The latter two methods outperformed the translation method on the target language
that had only several hundred training examples, i.e., a low resource target language.
12. Types of Transfer Learning
Inductive Transfer Learning: Inductive Transfer
Learning is when for different tasks in the source
and target domain we have labeled data in the
target domain only.
It can further be divided into the following sub-
categories:
• Sequential Transfer Learning
• Multi-task Transfer Learning
13. Sequential Transfer Learning
It involves learning multiple tasks in a sequential fashion. It is further divided into
five sub-categories:
1. Sequential Fine Tuning:
• Fine-tuning involves the training of the pre-trained model on the target task.
• One of the recent works involves the model for a unified pre-trained language
model i.e., UNILM.
• It combines three different training objectives to pre-train a model in a unified
way which includes Unidirectional, Bidirectional, and Sequence-to-Sequence.
• The UNILM model achieved state-of-the-art results on different tasks including
generative question answering, abstractive summarization, and document-
grounded dialog response generation.
14. Sequential Transfer Learning
2. Adapter Modules:
• They are a compact and extensible transfer learning method for NLP .
• Provides parameter efficiency by only adding a few trainable
parameters per task, and as new tasks are added previous ones don’t
require revisiting.
• In the latest work, adapter modules were used to share the parameters
between different tasks by fine-tuning the BERT model.
• The model was evaluated against GLUE tasks and obtained state-of-
the-art results on text entailment while achieving parameter efficiency.
15. Sequential Transfer Learning
3. Feature-Based:
• The representations of a pre-trained model are fed to another model. It provides the benefit of
using the task-specific model again for similar data.
• Also, extracting feature once saves a lot of computing resources if the same data is used
repeatedly.
• In one of the recent work, researchers used a semi-supervised approach for the task of
sequence labeling.
• A pre-trained neural language model was used that was trained in an unsupervised approach. It
was a bidirectional language model where both forward and backward hidden states are
concatenated together.
• The output was then augmented to token representations and fed to a supervised sequence
tagging model (TagLM) which was then trained in a supervised way to output the tag of each
sequence. The datasets used were CoNLL 2003 NER and CoNLL 200 chunking.
• The model achieved state-of-the-art results on both tasks compared to other forms of transfer
learning.
16. Sequential Transfer Learning
3. Zero-shot:
• Simplest approach where for a given pre-trained model, we don’t apply any
training procedure to optimize/learn new parameters.
• In a recent study, researchers used the zero-shot transfer on text
classification.
• Each classification task was modeled as a text entailment problem where the
positive class meant an entailment and the negative class meant there was
non.
• Then a pre-trained Bert model on text classification in a zero-shot scenario
was used to classify texts in different tasks like emotion detection, topic
categorization, and situation frame detection.
• This approach achieved better accuracy in two out of the three tasks in
comparison to unsupervised tasks like Word2Vec.
17. Multi-task Transfer Learning
It involves learning multiple tasks at the same time. For instance, if we are given a pre-trained
model and want to transfer the learning to multiple tasks then all tasks are learned in a parallel
fashion.
Multi-task Fine Tuning:
• In recent work, the researchers used this approach to explore the effect of using a unified text-
to-text transfer transformer (T5).
• The architecture used was similar to the Transformers model with an encoder-decoder network.
But it used fully-visible masking instead of casual masking especially for inputs that require
predictions based on a prefix like translation.
• The dataset used for training the models was created from the common crawl dataset which was
around 750GB. The model required around 11 billion parameters to be trained on such a large
dataset.
• Multi-task pre-trained models were used to perform well on different tasks where the models
were trained on different tasks by using prefixes like: “Translate English to German”. By fine-
tuning the model achieved state-of-the-art results on different tasks like text classification,
summarization, and question answering.
18. Conclusion
• We see that attention-based models are more popular compared to RNN-based and
CNN-based language models.
• BERT appears to be the default architecture for language modeling due to its
bidirectional architectures which makes it successful in many downstream tasks.
• In transfer learning techniques used for NLP, sequential fine-tuning seems to be the
most popular approach.
• Multi-task fine tuning seems to gain popularity in recent years as many studies found
that training on multiple tasks at the same time yields better results.
• Also, text classification datasets seems to be widely used compared to other tasks in
NLP because fine-tuning models in such tasks are easier.
19. Future Work
• For future work, it is recommended to use bidirectional models like BERT for
specific tasks like abstractive question answering, sentiment classification, and
parts-of-speech tagging and models like GPT-2, T5 for generative tasks like
generative question answering, summarization, and text generation.
• Adapter modules can replace sequential fine-tuning as they show comparable
results to traditional fine tuning and are faster and compact due to parameter
sharing.
• Also, extensive research should be done to reduce the size of these bigger
language models so that they can be easily deployed on embedded devices
and on the web.