Cd project

•Download as PPTX, PDF•

0 likes•23 views

divikmittal

Bayes

Engineering

Text Classification
using ULMFiT
CS 6105 : COMPILER DESIGN
Submitted To :-
Dr. Vishwambhar Pathak
Submitted By:-
Yashasvini Mathur (BE/25001/16)
Divik Mittal (BE/25035/16)

Introduction
Natural Language Processing (NLP) needs no introduction in
today’s world. It’s one of the most important fields of study and
research, and has seen a phenomenal rise in interest in the last
decade. The basics of NLP are widely known and easy to grasp.
But things start to get tricky when the text data becomes huge
and unstructured. That’s where deep learning becomes so
pivotal.
We will focus on the concept of transfer learning and how we
can leverage it in NLP to build incredibly accurate models using
the popular fastai library.

Overview of ULMFiT
Universal Language Model Fine-tuning(ULMFiT) achieves state-of-the-art result using novel techniques like:
• Discriminative fine-tuning
• Slanted triangular learning rates, and
• Gradual unfreezing
This method involves fine-tuning a pre-trained language model (LM), trained on the Wikitext 103 dataset, to a
new dataset in such a manner that it does not forget what it previously learned.
Language modeling captures general properties of a language and provides an enormous amount of data
which can be fed to other downstream NLP tasks. That is why Language modeling has been chosen as the
source task for ULMFiT.

Problem Statement
Our objective here is to fine-tune a pre-trained model and use it for
text classification on a new dataset. We will implement ULMFiT in this
process. The interesting thing here is that this new data is quite small
in size (<1000 labeled instances). A neural network model trained from
scratch would overfit on such a small dataset.
Dataset: We will use the 20 Newsgroup dataset available
in sklearn.datasets. As the name suggests, it includes text documents
from 20 different newsgroups.

Procedure
1. Cleaning the dataset – Removing header and footer
2. Preprocessing data
2.1 Retaining only alphabets
2.2 Removing Stopwords (Example – I, me, my, is, am, are etc..)
3. Splitting the data in training and validation set.
4. Data Preparation - Preparing our data for language model and classification model
separately
5. Training the Model
6. Get Predictions

Similar to Cd project

Thomas Wolf "Transfer learning in NLP"Fwdays

Analysis of the evolution of advanced transformer-based language models: Expe...IAESIJAI

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupYves Peirsman

AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONgerogepatton

AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONijaia

AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONgerogepatton

Benchmarking transfer learning approaches for NLPYury Kashnitsky

Knowledge distillation deeplabFrozen Paradise

Natural Language Processing - Research and Application TrendsShreyas Suresh Rao

A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3AIRCC Publishing Corporation

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...Jinho Choi

2201.00598.pdfKSHITIJCHAUDHARY20

SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSIJDKP

PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...webwinkelvakdag

Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Sharmila Sathish

ODSC East: Effective Transfer Learning for NLPindico data

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult

A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...kevig

Similar to Cd project (20)

Thomas Wolf "Transfer learning in NLP"

Analysis of the evolution of advanced transformer-based language models: Expe...

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup

AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION

Benchmarking transfer learning approaches for NLP

Knowledge distillation deeplab

Natural Language Processing - Research and Application Trends

A Comparative Study of Text Comprehension in IELTS Reading Exam using GPT-3

EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...

2201.00598.pdf

SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELS

PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...

Feature Extraction and Analysis of Natural Language Processing for Deep Learn...

ODSC East: Effective Transfer Learning for NLP

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...

A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...

Recently uploaded

Extrusion Processes and Their Limitations120cr0395

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

result management system report for college projectTonystark477637

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

Recently uploaded (20)

Extrusion Processes and Their Limitations

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

SPICE PARK APR2024 ( 6,793 SPICE Models )

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

UNIT-III FMM. DIMENSIONAL ANALYSIS

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

Microscopic Analysis of Ceramic Materials.pptx

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Processing & Properties of Floor and Wall Tiles.pptx

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

KubeKraft presentation @CloudNativeHooghly

result management system report for college project

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

Cd project

1. Text Classification using ULMFiT CS 6105 : COMPILER DESIGN Submitted To :- Dr. Vishwambhar Pathak Submitted By:- Yashasvini Mathur (BE/25001/16) Divik Mittal (BE/25035/16)

2. Introduction Natural Language Processing (NLP) needs no introduction in today’s world. It’s one of the most important fields of study and research, and has seen a phenomenal rise in interest in the last decade. The basics of NLP are widely known and easy to grasp. But things start to get tricky when the text data becomes huge and unstructured. That’s where deep learning becomes so pivotal. We will focus on the concept of transfer learning and how we can leverage it in NLP to build incredibly accurate models using the popular fastai library.

3. Overview of ULMFiT Universal Language Model Fine-tuning(ULMFiT) achieves state-of-the-art result using novel techniques like: • Discriminative fine-tuning • Slanted triangular learning rates, and • Gradual unfreezing This method involves fine-tuning a pre-trained language model (LM), trained on the Wikitext 103 dataset, to a new dataset in such a manner that it does not forget what it previously learned. Language modeling captures general properties of a language and provides an enormous amount of data which can be fed to other downstream NLP tasks. That is why Language modeling has been chosen as the source task for ULMFiT.

4. Problem Statement Our objective here is to fine-tune a pre-trained model and use it for text classification on a new dataset. We will implement ULMFiT in this process. The interesting thing here is that this new data is quite small in size (<1000 labeled instances). A neural network model trained from scratch would overfit on such a small dataset. Dataset: We will use the 20 Newsgroup dataset available in sklearn.datasets. As the name suggests, it includes text documents from 20 different newsgroups.

5. Procedure 1. Cleaning the dataset – Removing header and footer 2. Preprocessing data 2.1 Retaining only alphabets 2.2 Removing Stopwords (Example – I, me, my, is, am, are etc..) 3. Splitting the data in training and validation set. 4. Data Preparation - Preparing our data for language model and classification model separately 5. Training the Model 6. Get Predictions

6. Loss vs Learning rate

7. Final Result

Cd project

Recommended

Recommended

More Related Content

Similar to Cd project

Similar to Cd project (20)

Recently uploaded

Recently uploaded (20)

Cd project