This document outlines an introductory lecture on research methods in natural language processing (NLP). It discusses empirical research methods in computer science, how to choose a good research topic, how to read scientific papers, how to work with an advisor, and doing research in NLP. The document provides an overview of key aspects of conducting research in NLP such as identifying problems, developing ideas, conducting experiments, analyzing results, and iterating on the process. It also discusses common NLP problems and applications. The overall summary is an introductory lecture on best practices for conducting NLP research.
This presentation is an introduction to reading skills based on our book the "Study and Thinking Skills Towards English Proficiency for College Students. Hope this can help :)
Human-computer interaction (HCI) is a multidisciplinary field of study focusing on the design of computer technology and, in particular, the interaction between humans (the users) and computers. While initially concerned with computers, HCI has since expanded to cover almost all forms of information technology design
This chapter shows how to use knowledge about the wlorld to make decisions even when the
outcomes of an action are uncertain and the rewards for acting might not be reaped until many
actions have passed. The main points are as follows:
e Sequential decision problems in uncertain envirsinments,also called Markov decision
processes, or MDPs, are defined by a transition model specifying the probabilistic
outcomes of actions and a reward function specifying the reward in each state.
o The utility of a state sequence is the sum of all the rewards over the sequence, possibly
discounted over time. The solution of an MDP is a policy that associates a decision
with every state that the agent might reach. An optimal policy maximizes the utility of
the state sequences encountered when it is execut~ed.
e The utility of a state is the expected utility of the state sequences encountered when
an optimal policy is executed, starting in that state. The value iteration algorithm for
solving MDPs works by iteratively solving the equations relating the utilities of each
state to that of its neighbors.
Policy iteration alternates between calculating the utilities of states under the current
policy and improving the current policy with respect to the current utilities.
* Partially observable MDPs, or POMDPs, are much more difficult to solve than are
MDPs. They can be solved by conversion to an MDP in the continuous space of belief
states. Optimal behavior in POMDPs includes information gathering to reduce uncertainty and therefore make better decisions in the fiuture.
A decision-theoretic agent can be constructed for POMDP environments. The agent
uses a dynamic decision network to represent the transition and observation models,
to update its belief state, and to project forward possible action sequences.
Game theory describes rational behavior for agents in situations where multiple agents
interact simultaneously. Solutions of games are Nash equilibria-strategy profiles in
which no agent has an incentive to deviate from the specified strategy.
Mechanism design can be used to set the rules by which agents will interact, in order
to maximize some global utility through the operation of individually rational agents.
Sometimes, mechanisms exist that achieve this goal without requiring each agent to
consider the choices made by other agents.
We shall return to the world of MDPs and POMDP in Chapter 21, when we study reinforcement learning methods that allow an agent to improve its behavior from experience in sequential, uncertain environments.
This presentation is an introduction to reading skills based on our book the "Study and Thinking Skills Towards English Proficiency for College Students. Hope this can help :)
Human-computer interaction (HCI) is a multidisciplinary field of study focusing on the design of computer technology and, in particular, the interaction between humans (the users) and computers. While initially concerned with computers, HCI has since expanded to cover almost all forms of information technology design
This chapter shows how to use knowledge about the wlorld to make decisions even when the
outcomes of an action are uncertain and the rewards for acting might not be reaped until many
actions have passed. The main points are as follows:
e Sequential decision problems in uncertain envirsinments,also called Markov decision
processes, or MDPs, are defined by a transition model specifying the probabilistic
outcomes of actions and a reward function specifying the reward in each state.
o The utility of a state sequence is the sum of all the rewards over the sequence, possibly
discounted over time. The solution of an MDP is a policy that associates a decision
with every state that the agent might reach. An optimal policy maximizes the utility of
the state sequences encountered when it is execut~ed.
e The utility of a state is the expected utility of the state sequences encountered when
an optimal policy is executed, starting in that state. The value iteration algorithm for
solving MDPs works by iteratively solving the equations relating the utilities of each
state to that of its neighbors.
Policy iteration alternates between calculating the utilities of states under the current
policy and improving the current policy with respect to the current utilities.
* Partially observable MDPs, or POMDPs, are much more difficult to solve than are
MDPs. They can be solved by conversion to an MDP in the continuous space of belief
states. Optimal behavior in POMDPs includes information gathering to reduce uncertainty and therefore make better decisions in the fiuture.
A decision-theoretic agent can be constructed for POMDP environments. The agent
uses a dynamic decision network to represent the transition and observation models,
to update its belief state, and to project forward possible action sequences.
Game theory describes rational behavior for agents in situations where multiple agents
interact simultaneously. Solutions of games are Nash equilibria-strategy profiles in
which no agent has an incentive to deviate from the specified strategy.
Mechanism design can be used to set the rules by which agents will interact, in order
to maximize some global utility through the operation of individually rational agents.
Sometimes, mechanisms exist that achieve this goal without requiring each agent to
consider the choices made by other agents.
We shall return to the world of MDPs and POMDP in Chapter 21, when we study reinforcement learning methods that allow an agent to improve its behavior from experience in sequential, uncertain environments.
This is a workshop presentation for email etiquette's which will improve your business writing. go through the whole presentation & improve your skills of writing business emails. This presentation is already used in various training organisations.
Mobile computing has been undergoing a bit of a renaissance lately. A few years ago it was a simple matter of finding a data-compatible mobile phone, a PC card modem, and a matching cable and installing it as a modem. Then people started to use PDA’s (Personal Data Assistants) as well. Cell phones started to come with infrared ports to allow communication with laptops. Then cell phones started to come with modems built in. The connecting methods of mobile computing, its introduction, connection types, factors affecting connections, mobile applications and its limitations will be discussed.
Fog Computing is a paradigm that extends Cloud computing and services to the edge of the network. Similar to Cloud, Fog provides data, compute, storage, and application services to end-users. The motivation of Fog computing lies in a series of real scenarios, such as Smart Grid, smart traffic lights in vehicular networks and software defined networks.
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
This K-Nearest Neighbor Classification Algorithm presentation (KNN Algorithm) will help you understand what is KNN, why do we need KNN, how do we choose the factor 'K', when do we use KNN, how does KNN algorithm work and you will also see a use case demo showing how to predict whether a person will have diabetes or not using KNN algorithm. KNN algorithm can be applied to both classification and regression problems. Apparently, within the Data Science industry, it's more widely used to solve classification problems. It’s a simple algorithm that stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. Now lets deep dive into these slides to understand what is KNN algorithm and how does it actually works.
Below topics are explained in this K-Nearest Neighbor Classification Algorithm (KNN Algorithm) tutorial:
1. Why do we need KNN?
2. What is KNN?
3. How do we choose the factor 'K'?
4. When do we use KNN?
5. How does KNN algorithm work?
6. Use case - Predict whether a person will have diabetes or not
Simplilearn’s Machine Learning course will make you an expert in Machine Learning, a form of Artificial Intelligence that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming. You will master Machine Learning concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, hands-on modeling to develop algorithms and prepare you for the role of Machine Learning Engineer
Why learn Machine Learning?
Machine Learning is rapidly being deployed in all kinds of industries, creating a huge demand for skilled professionals. The Machine Learning market size is expected to grow from USD 1.03 billion in 2016 to USD 8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
You can gain in-depth knowledge of Machine Learning by taking our Machine Learning certification training course. With Simplilearn’s Machine Learning course, you will prepare for a career as a Machine Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Learn more at: https://www.simplilearn.com
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data-mining algorithm used to perform hierarchical clustering over, particularly large data sets.
What is "Green Computing" and why we need green computing in current Information technology (IT) industry to gain more benefits from electronic devices while we protect the environment.
Research Methods in Natural Language Processing (2018 version)Minh Pham
Updated version of my lecture slide about "Research Methods in Natural Language Processing" for the course RAW-501 in Master program of FPT University.
Students are usually confused on how to start their projects. This presentation will help students right from choosing topic to revision of their project. In case if you are still confused about your project, email us at greengyaanam.co.in@gmail.com
This is a workshop presentation for email etiquette's which will improve your business writing. go through the whole presentation & improve your skills of writing business emails. This presentation is already used in various training organisations.
Mobile computing has been undergoing a bit of a renaissance lately. A few years ago it was a simple matter of finding a data-compatible mobile phone, a PC card modem, and a matching cable and installing it as a modem. Then people started to use PDA’s (Personal Data Assistants) as well. Cell phones started to come with infrared ports to allow communication with laptops. Then cell phones started to come with modems built in. The connecting methods of mobile computing, its introduction, connection types, factors affecting connections, mobile applications and its limitations will be discussed.
Fog Computing is a paradigm that extends Cloud computing and services to the edge of the network. Similar to Cloud, Fog provides data, compute, storage, and application services to end-users. The motivation of Fog computing lies in a series of real scenarios, such as Smart Grid, smart traffic lights in vehicular networks and software defined networks.
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
This K-Nearest Neighbor Classification Algorithm presentation (KNN Algorithm) will help you understand what is KNN, why do we need KNN, how do we choose the factor 'K', when do we use KNN, how does KNN algorithm work and you will also see a use case demo showing how to predict whether a person will have diabetes or not using KNN algorithm. KNN algorithm can be applied to both classification and regression problems. Apparently, within the Data Science industry, it's more widely used to solve classification problems. It’s a simple algorithm that stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. Now lets deep dive into these slides to understand what is KNN algorithm and how does it actually works.
Below topics are explained in this K-Nearest Neighbor Classification Algorithm (KNN Algorithm) tutorial:
1. Why do we need KNN?
2. What is KNN?
3. How do we choose the factor 'K'?
4. When do we use KNN?
5. How does KNN algorithm work?
6. Use case - Predict whether a person will have diabetes or not
Simplilearn’s Machine Learning course will make you an expert in Machine Learning, a form of Artificial Intelligence that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming. You will master Machine Learning concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, hands-on modeling to develop algorithms and prepare you for the role of Machine Learning Engineer
Why learn Machine Learning?
Machine Learning is rapidly being deployed in all kinds of industries, creating a huge demand for skilled professionals. The Machine Learning market size is expected to grow from USD 1.03 billion in 2016 to USD 8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
You can gain in-depth knowledge of Machine Learning by taking our Machine Learning certification training course. With Simplilearn’s Machine Learning course, you will prepare for a career as a Machine Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Learn more at: https://www.simplilearn.com
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data-mining algorithm used to perform hierarchical clustering over, particularly large data sets.
What is "Green Computing" and why we need green computing in current Information technology (IT) industry to gain more benefits from electronic devices while we protect the environment.
Research Methods in Natural Language Processing (2018 version)Minh Pham
Updated version of my lecture slide about "Research Methods in Natural Language Processing" for the course RAW-501 in Master program of FPT University.
Students are usually confused on how to start their projects. This presentation will help students right from choosing topic to revision of their project. In case if you are still confused about your project, email us at greengyaanam.co.in@gmail.com
RESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdfMATIULLAH JAN
What the methodology chapter is and why it is important?
How to structure and write up the methodology chapter:
The research design:
The research philosophy:
The research type:
Inductive research,
The research strategy:
Experimental research
The time horizon:
The sampling strategy:
The data collection method
The analysis methods and techniques:
The methodological limitations
Reading academic papers is one of the most important parts of scientific research. However, junior graduate students may spend a lot of time learning how to read papers efficiently and effectively. In this talk, I will discuss some basic issues and introduce useful websites/tools/tips for paper reading.
Writing an effective Poster: the point of view of experts, novices and litera...Elisabetta Cigognini
12th International Conference
EARLI SIG on Writing8-10 September 2010, Heidelberg
Gisella Paoletti, M. Elisabetta Cigognini
Department of Psichology
University of Trieste
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTMinh Pham
Bài giảng về cách sử dụng prompt engineering hiệu quả với ChatGPT. Sau khi học xong bài giảng, người dùng hiểu về cấu trúc cơ bản của prompt, biết cách thiết kế prompt một cách hiệu quả, tiết kiệm
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...Minh Pham
Our presentation slide at the 13th IEEE International Conference on Knowledge and Systems Engineering (KSE 2021).
In this paper, we present our participated systems for three Vietnamese legal text processing tasks at Automated Legal Question Answering Competition (ALQAC 2021). In our systems, we leverage the strength of traditional information retrieval methods (BM25), pre-trained masked language models (BERT), and legal domain knowledge. Our proposed methods help to overcome the shortage of training data. Especially, in the legal textual entailment task, we propose a novel data augmentation
method that is based on legal domain knowledge. Evaluation
results show the effectiveness of our proposed methods.
Research methods for engineering students (v.2020)Minh Pham
Beginning students who start doing research may face to many difficulties from choosing a good research topic to start, how to develop new ideas to how to implement models to test their ideas and write papers. Research skill is a craft skill. You only learn it by doing. However, it is good to learn know-how in doing research. In this lecture, I share information of how-to-do research for engineering students with the hope that it will help students to save time at the beginning state of doing research.
Tài liệu giới thiệu kiến thức cơ bản về AIML và cách sử dụng khi phát triển chatbot. Để áp dụng được tốt hơn, độc giả cần tìm hiểu các tài liệu chi tiết hơn.
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMinh Pham
Slide bài thuyết trình tại sự kiện của của công ty rubikAI. Nội dung của bài trình bày là kiến thức cơ bản về mạng neural và ứng dụng trong xử lý ngôn ngữ tự nhiên.
Slide của bài trình bày tại al+ AI Seminar số 4 về báo bài báo được giải thưởng best paper award tại hội nghị NAACL 2018
Peters et al., 2018. Deep Contextualized Word Representations. In NAACL.
Bài báo gốc: http://aclweb.org/anthology/N18-1202
Mô hình ELMo là mô hình biểu diễn từ phụ thuộc ngữ cảnh học từ mô hình ngôn ngữ hai chiều. ELMo được áp dụng cho nhiều bài toán khác nhau và đạt kết quả tốt nhất trên nhiều tập dữ liệu.
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...Minh Pham
The presentation of a feature-based model for nested named-entity recognition at VLSP 2018. Our system obtained the first rank among participant systems. There is still a gap between the accuracy on the development set and the test set.
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017Minh Pham
Trình bày về kỹ thuật attention trong mô hình sequence-to-sequence và ứng dụng trong các nghiên cứu NLP tại ACL 2017. Ngoài ra chúng tôi cũng tóm tắt một số các nghiên cứu thú vị khác tại hội nghị.
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotMinh Pham
Trình bày về những bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot theo mô hình truy xuất thông tin. Ngoài ra mô hình sinh hội thoại sử dụng mạng Neural cũng được đề cập (neural chatbot)
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Nutraceutical market, scope and growth: Herbal drug technology
Research Methods in Natural Language Processing
1. Research Methods in Natural Language Processing
Pham Quang Nhat Minh
FPT Technology Research Institute
FPT University
minhpqn2@fe.edu.vn
April 16, 2017
2. Objectives of the lecture
Introduce some research know-how and practices in doing
research
Focus on NLP/Machine Learning/Data Science fields
Share my research experiences in the field NLP
Pham Quang Nhat Minh Research Methods in NLP 2/70
3. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 3/70
4. Acknowledgements
Many contents in the lecture are from documents in the
references
(Alon, 2009) How To Choose a Good Scientific Problem
(Wilson et al., 2012) Best Practices for Scientific Computing
Paul Cohen: Empirical Methods for AI & CS
Other documents, blogs
Pham Quang Nhat Minh Research Methods in NLP 4/70
5. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 5/70
6. What does “empirical” mean?
Relying on observations, data, experiments
Empirical work should complement theoretical work
Theories often have holes (e.g., How big is the constant term?)
Theories are suggested by observations
Theories are tested by observations
Conversely, theories direct our empirical attention
In addition, empirical means “wanting to understand
behaviour of complex systems”
In NLP, we may want to understand how features are
correlated
Pham Quang Nhat Minh Research Methods in NLP 6/70
7. Why we need empirical methods
Theory based science need not be all theorems
We do not know how a theory works in different conditions
Different data sets, domains
Pham Quang Nhat Minh Research Methods in NLP 7/70
8. Empirical methods in CS/AI
Data observation
Construct hypotheses
Test with empirical experiments
Refine hypotheses and modelling assumptions
Pham Quang Nhat Minh Research Methods in NLP 8/70
9. Kinds of data analysis
Exploratory (EDA) - looking for patterns in data
Statistical inferences from sample data
Testing hypotheses
Estimating parameters
Building mathematical models of datasets
Machine learning, data mining...
Pham Quang Nhat Minh Research Methods in NLP 9/70
10. Tools for data analysis
R programming language
Python:
numpy
scipy
pandas
matplotlib for data visualization
My bias opinions:
statisticians like R, computer scientists often use Python
Python is much easier to learn than R
Pham Quang Nhat Minh Research Methods in NLP 10/70
11. Exercises
Install R: https://www.r-project.org
Download the data file ex1data1.txt from:
http://tinyurl.com/m7bpp8d
The data file has two columns:
First column: the population of a city.
Second column: the profit of a food truck in that city.
In R terminal, try the plot code
df <- read.table("./ex1data1.txt", sep=",",
header=FALSE)
plot(df[,1], df[,2], xlab=‘‘Profit in
$10,000s’’, ylab=‘‘Population of City in
10,000s’’)
Pham Quang Nhat Minh Research Methods in NLP 11/70
12. R for data visualization
Pham Quang Nhat Minh Research Methods in NLP 12/70
13. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 13/70
14. Why do we need to choose a good research topic?
“Garbage in, garbage out” principle
You may work with a research topic for years
1 year for a master thesis
3 years or more for a Ph.D. dissertation
It is painful to do things that you feel uninteresting
Lack passion, motivations, ideas
Much frustration and bitterness
Pham Quang Nhat Minh Research Methods in NLP 14/70
15. What is a good research topic?
(Alon, 2009) Two Dimensions of Problem Choice
Feasibility: whether a problem is hard or easy
We can measure the feasibility as the expected time to
complete the project
Feasibility is a function of the skills of students/researchers
and of the technology in the lab.
Interest: the increase in knowledge expected from the project.
Pham Quang Nhat Minh Research Methods in NLP 15/70
16. Two-dimensional space of Problem Choice (1)
Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon,
2009)
Pham Quang Nhat Minh Research Methods in NLP 16/70
17. Two-dimensional space of Problem Choice (2)
Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon,
2009)
Pham Quang Nhat Minh Research Methods in NLP 17/70
18. What is a good research topic?
Are many people care about the topic?
Research community, your supervisors, industry demands
Are you really interested in the topic?
The topic should be interesting to you rather than to others
Good signs: “ideas and questions that come back again and
again to your mind for months or years.”
Pham Quang Nhat Minh Research Methods in NLP 18/70
19. How to choose a good research topic: steps by steps
Choose the broad (general) topic
E.g, Machine Translation
Draw a hierarchy of research topics, starting from the broad
topic
Review literature to look for gaps in previous work
Choose the focused topic
E.g., Phrase-based Machine Translation
Find gaps in previous work
Form research questions in the focused topic
From research questions, formulate the research problem
Pham Quang Nhat Minh Research Methods in NLP 19/70
20. Finding a research problem
Take your time to choose a good research topic
(Alon, 2009): Rule for new Ph.D. students and postdocs: “Do
not commit to a problem before 3 months have elapsed”
For master students, take 1-2 months for choosing the research
topic before your start the research project.
Join projects in your laboratory
Many research ideas for thesis are from projects you involved
Pham Quang Nhat Minh Research Methods in NLP 20/70
21. Developing your research ideas
Where do research ideas come from?
Observations
Data observations, data analysis, discover patterns in data
Reading papers, attending conferences, listening talks
Techniques, methods from other disciplines, fields
Imagine
Suggestions from your advisor
Pham Quang Nhat Minh Research Methods in NLP 21/70
22. Reading papers, attending conferences
Choose good and relevant papers. Consider:
Impact factors of the journal.
In the NLP field, choose papers from top conferences, journals
(ACL/NAACL/EMNLP/COLING)
The Top 10 NLP Conferences:
http://www.junglelightspeed.com/
the-top-10-nlp-conferences
Reputations of authors and their organizations
Not only readings, but criticizing papers and finding the gaps
Pham Quang Nhat Minh Research Methods in NLP 22/70
23. Techniques, methods from other fields
Expand your view, problem solving methodologies by regularly
reading articles in other fields.
An example is the task image captioning
We need to use techniques from both computer vision and
NLP.
Pham Quang Nhat Minh Research Methods in NLP 23/70
24. What happens after we choose a problem? (Alon, 2009)
Pham Quang Nhat Minh Research Methods in NLP 24/70
25. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 25/70
26. Two types of readings
Fast readings
Get and understand the basic ideas of the paper
Know the problems the paper attacks and how it solves that
Put the paper in the “big picture” of the field
Know what are differences between the paper and previous
work
We do “fast reading” much when we survey literature and
choose a broad topic
Deep readings
Understand the details of presented methods
Try to understand how the proposed method works
Criticize the paper and find its limitations
If you were the authors, how would you solve the problem?
Propose alternative methods?
We do “deep reading” much we look for a focused topic
Pham Quang Nhat Minh Research Methods in NLP 26/70
27. How to read a scientific paper (1)
Michael J. Hanson. Efficient Readings of Papers in Science and Technology: http://tinyurl.com/qdebynz
Pham Quang Nhat Minh Research Methods in NLP 27/70
28. How to read a scientific paper (2)
Decide what to read
Read title, abstract
Read it, file it, or skip it
Read for breath
What did they do
Skim introduction, headings, graphics, definitions, conclusions
and bibliography.
Consider the credibility.
How useful is it?
Decide whether to go on.
Pham Quang Nhat Minh Research Methods in NLP 28/70
29. How to read a scientific paper (3)
Read in depth
How did they do it?
Challenge their arguments.
Examine assumptions.
Examine methods.
Examine statistics.
Examine reasoning and conclusions.
How can I apply their approach to my work?
Take notes
Make notes as you read.
Highlight major points.
Note new terms and definitions.
Summarize tables and graphs.
Write a summary.
Pham Quang Nhat Minh Research Methods in NLP 29/70
30. Homework
Choose one scientific article that you want to read in depth, read,
take notes and explain ideas, methods presented in the paper to
other students in a simple way.
Notes: You should be able to answer 3 questions as follows.
What is the problem the paper attack?
What are the differences between the paper and other existing
papers?
What are interesting points of the presented methods?
Pham Quang Nhat Minh Research Methods in NLP 30/70
31. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 31/70
32. Some basic rules
Your advisor is supposed to be very busy, so you should follow
up her/him
Schedule the meeting in advanced and ask for meeting
Keep regular meeting with your advisor
Usually weekly meeting
Do not just do what your advisor tell you to do
Rule of thumbnail: You should finish all your assigned tasks
before doing your own ideas
Pham Quang Nhat Minh Research Methods in NLP 32/70
33. How to write a progress/status report
Michael Ernst. Writing a progress/status report:
http://tinyurl.com/zp7cdvt
Quote the previous week’s plan.
This helps you determine whether you accomplished your goals.
State this week’s progress.
What you have accomplished,
What you learned, what difficulties you overcame, what
difficulties are still blocking you,
Your new ideas for research directions or projects, etc
Give the next week’s plan.
A good format is a bulleted list
Try to make each goal measurable: there should be no
ambiguity as to whether you were able to finish it.
It’s good to include longer-term goals as well.
Pham Quang Nhat Minh Research Methods in NLP 33/70
34. Communicate with your advisor
Prepare some slides (3-4 slides) to make the discussion
concrete
Send the materials at least 24 hours before the meeting day
Arrange the meeting in advanced
Your advisor is not always right
Actually you know more about your work than her/him
If you have data, evidences, proofs, do not hesitate to debate
Do not say “I guest”, “I think” when you explain something.
Use data, evidences, references instead
Pham Quang Nhat Minh Research Methods in NLP 34/70
35. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 35/70
36. What is Natural Language Processing?
A field of computer science, artificial intelligence, and
computational linguistics
To get computers to perform useful tasks involving human
languages
Human-Machine communication
Improving human-human communication
E.g Machine Translation
Extracting information from texts
Pham Quang Nhat Minh Research Methods in NLP 36/70
37. Why is NLP interesting?
Languages involve many human activities
Reading, writing, speaking, listening
Voice can be used as an user interface in many applications
Remote controls, virtual assistants like siri,...
NLP is used to acquire insights from massive amount of
textual data
E.g., hypotheses from medical, health reports
NLP has many applications
NLP is hard!
Pham Quang Nhat Minh Research Methods in NLP 37/70
38. NLP problems
Fundamental problems
Word Segmentation
Part-of-speech tagging
Syntactic Analysis
Semantic Analysis
Application problems
Information Retrieval
Information Extraction
Question Answering
Text Summarization
Machine Translation
Pham Quang Nhat Minh Research Methods in NLP 38/70
39. What is it like doing research in NLP?
Empirical methods are applied much in NLP
Relying on observations, data, experiments
Contains many loops of experiments
Identify the problem → Create ideas → Test the best idea →
Analyse results → Identify the problem → Create ideas → · · ·
Pham Quang Nhat Minh Research Methods in NLP 39/70
40. What is it like doing research in NLP?
Many ideas do not work
Even though, we need to analyse the results to understand
why they do not work to come up with new ideas.
Try the next idea
Fails occur more often than successes
Try to increase the number of experiments
(No of successes) = (No of experiments) × (Success rate)
Pham Quang Nhat Minh Research Methods in NLP 40/70
41. The typical working day of a NLP researcher
Data observation and data/result analysis (a lot)
Discuss ideas with colleagues
Do experiments (run the program) to test ideas
Reading papers to keep up-to-date on mainstream researches
Investigate new NLP/Machine Learning tools, libraries (less
regular)
Pham Quang Nhat Minh Research Methods in NLP 41/70
42. How to learn NLP?
Research starts from learning
Learn/review background about:
Probabilistic and Statistics
Basic math (linear algebra, calculus)
Machine Learning
Programming
Read NLP textbooks
Jurafsky, D., & Martin, J.H. Speech and Language Processing:
an Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition.
Manning, C.D., & Schutze, H. Foundations of statistical
natural language processing.
Pham Quang Nhat Minh Research Methods in NLP 42/70
43. How to learn NLP: Get your hands dirty
Practice with programming exercises:
100 NLP drill exercises: https://github.com/
minhpqn/nlp_100_drill_exercises
NLP Programming Tutorial, by Graham Neubig:
http://www.phontron.com/teaching.php
Compete in Kaggle data science challenges (kaggle.com)
Pham Quang Nhat Minh Research Methods in NLP 43/70
44. Finding a NLP research problem
All the principles in the section “How to choose a good
research topic” apply.
Looking for ideas from related fields
Linguistics
Machine learning: mainstream in the NLP field is applying
machine learning methods in the NLP problems
Computer vision
Looking at data
It is actually my daily task
Pham Quang Nhat Minh Research Methods in NLP 44/70
45. Basic rules to choose NLP papers
READ:
Papers in top conferences and journals in NLP and other
related fields
(ACL/EMNLP/NAACL/EACL/COLING/CoNLL/...)
Workshops that focus on an NLP sub-field
Short papers at top conferences
PhD dissertations from top institutions/advisors
Papers with many citations
Textbooks from leading researchers
For more information, see: The Top 10 NLP Conferences1
1
http://www.junglelightspeed.com/the-top-10-nlp-conferences/
Pham Quang Nhat Minh Research Methods in NLP 45/70
46. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 46/70
47. Why is coding important in NLP/ML research?
Many (most) NLP/ML research work is empirical studies
Need to do data analysis, run experiments to test our ideas
So, we have to write programs
Even theorists should program, too
“Implementing your own algorithm is a good way of checking
your work. If you aren’t implementing your algorithm,
arguably you’re skipping a key step in checking your results.”
—Michael Mitzenmacher
http://mybiasedcoin.blogspot.com/2008/11/bugs.html
Pham Quang Nhat Minh Research Methods in NLP 47/70
48. Why we care about coding practices in NLP research?
Bad coding practices cause problems
You find errors in the experimental results right before the
paper submission deadline
You cannot understand your own code after some months
You deleted intermediate results, so you cannot verify the code
You do not know the technique to verify experimental results
You did not test the code, and then use untested code for
experiments
You spend long time for refactoring the code
You could not get back the version that generate the best
results
...
Pham Quang Nhat Minh Research Methods in NLP 48/70
49. Why we care about coding practices in NLP research?
Good coding practices speed up our research work
Recall that:
(No of successes) = (No of experiments) × (Success rate)
Pham Quang Nhat Minh Research Methods in NLP 49/70
50. Best Practices for Scientific Computing
(Wilson et al., 2012)
1- Write programs for people, not computers.
Readers of the code do not need to remember too much
Easy to read: names should be consistent, distinctive, and
meaningful
Break down the coding work into one-hour-long tasks
2- Automate repetitive tasks
Scientists should rely on the computer to repeat tasks.
Should use a script to run program!!
Use a build tool to automate their scientific workflows
Pham Quang Nhat Minh Research Methods in NLP 50/70
51. Best Practices for Scientific Computing
3- Use the computer to record history
Unique identifiers and version numbers for raw data records
Unique identifiers and version number for programs and
libraries
The values of parameters used to generate any given output;
The names and version number of programs used to generate
those outputs.
4- Make incremental changes
Scientists can not know what their programs should do next
until the current version has produced some results.
Should work in small steps with frequent feedback and
correction!
Pham Quang Nhat Minh Research Methods in NLP 51/70
52. Best Practices for Scientific Computing
5- Use a version control system: git, mercural, subversion. Push
code to github, bitbucket
Everything that has been created manually should be put in
version control
6- Do not repeat yourself (or others)
At small-scale, code should be modularized rather than copied
and pasted.
At large-scale, scientific programmers should re-use code
instead of re-writing it.
Pham Quang Nhat Minh Research Methods in NLP 52/70
53. Best Practices for Scientific Computing
7- Plan for mistakes
Write and run tests
Unit Test: Check the correctness of each single software unit
Integration Test: Check that pieces of unit code work
correctly when combined.
Regression Test: Running pre-existing code tests after changes
to the code in order to make sure that it hasn’t regressed.
Should use off-the-self unit testing library
Pham Quang Nhat Minh Research Methods in NLP 53/70
54. Best Practices for Scientific Computing
8- Optimize software only after it works correctly
Use profiler to identify bottlenecks
Write code in the highest-level language possible
Python is recommended language for research
Only use low-level programming language when they are sure
that performance boost is needed.
Use the highest-level programming language for rapid
prototyping.
Pham Quang Nhat Minh Research Methods in NLP 54/70
55. 9- Document design, and purpose, not mechanics
Document interface and reasons, not implementations
Do not do that
i = i + 1 # Increment the variable ’i’ by
one.
Refactor the code instead of explaining how it works
Embed the documentation for a piece of software in that
software
Use software to generate documentation.
10- Collaborate
Use pre-merge code reviews
Use an issue tracking tool.
Pham Quang Nhat Minh Research Methods in NLP 55/70
56. Coding practices for NLP/ML research
All general practices apply for NLP/ML research
Separate a process into small processes
Use pipelines in Unix/Linux
Make use of tools in experiments
Linux commands
NLP/ML Tools
Libraries (json, nltk, matplotlib, scikit-learn,...)
Algorithms
E.g., Show statistics about number of words in a text file
source file name.txt | cut -f1 | sort | uniq
-c | sort -nr
Visualize experimental results, make demo for your research
results
Pham Quang Nhat Minh Research Methods in NLP 56/70
58. Optimize codes only after your ideas work
“Make it work. Make it right. Make it fast.” (Kent Beck)
“Premature optimization is the root of all evil (or at least
most of it) in programming.” (Donald Knuth)
In NLP, always start with a simple and dirty working version
E.g, Bag-of-word features and Naive Bayes algorithm in text
classification tasks
Pham Quang Nhat Minh Research Methods in NLP 58/70
59. Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 My research stories
Pham Quang Nhat Minh Research Methods in NLP 59/70
60. My profile
6/2006: B.Sc. in Information Technology from University of
Engineering and Technology, Vietnam National University,
Hanoi
3/2010: M.Sc. in Information Science from Japan Advanced
Institute of Science and Technology
3/2013: Ph.D. in Information Science from Japan Advanced
Institute of Science and Technology
Pham Quang Nhat Minh Research Methods in NLP 60/70
61. Master program at JAIST
JAIST is a public graduate institute in Japan
Homepage: https://www.jaist.ac.jp/english
Three schools
Information Science
Knowledge Science
Material Science
All courses have English version
You can learn in English
Pham Quang Nhat Minh Research Methods in NLP 61/70
62. Master program at JAIST
Two-year full-time master program
First year:
Students are temporarily assigned to a laboratory, and select
the official lab after 3 months
In the first year, mainly taking courses and choosing the
master research topic
Write the research proposal for master thesis in the end of the
first year
Second year:
Finishing all remaining course work
Working on master research project
Looking for jobs (students who do not pursue Ph.D.)
Pham Quang Nhat Minh Research Methods in NLP 62/70
63. How did I finish my master?
Six months before entering master program
Take Japanese course
Review background
Read NLP Textbooks
First year:
Finish all course work
Join a research project in my laboratory
Choose the research topic
Second year:
Do research
Attend one international conference
Thesis defense
Pham Quang Nhat Minh Research Methods in NLP 63/70
64. How I choose my master thesis
I even did not know how to choose a research topic (crying)
You should know how to choose
I was assigned the topic by my co-advisor
The topic is about sentence insertion
I proposed a method to improve the previous results
Pham Quang Nhat Minh Research Methods in NLP 64/70
65. Sentence insertion task
Task: To automatically updating a wikipedia article by inserting
new information into that.
I proposed to use Word Clusters to capture meaning of words
Pham Quang Nhat Minh Research Methods in NLP 65/70
66. Research projects at FPT Technology Research Institute
NLP problems in chatbot development
Intent classification
Named entity recognition
FAQ generation from chat history, manuals
Figure: Source: stanfy.com: http://tinyurl.com/mdfsa6h)
Pham Quang Nhat Minh Research Methods in NLP 66/70
67. Summary
Empirical research methods reply on observations, data,
experiments
Two dimensions of problem choice: Feasibility and Interest
Research starts from learning
Reading is very important in research
NLP research involves much data analysis
Coding practices for NLP/ML research
Pham Quang Nhat Minh Research Methods in NLP 67/70
68. Check-list for your master thesis
1 Is your work reproducible?
Package your code so that it can automatically generate the
results by a single script
Freeze the final version
2 Is your proposed method new
3 Did you revise your thesis many times?
Ask your advisors, friends for proof reading
4 Did you understand previous work?
5 Do you think you can pass the master thesis defense?
Pham Quang Nhat Minh Research Methods in NLP 68/70
69. Advices for your master thesis
Take time to choose your master research topic
Work on the research problem that you are interested in
Start soon
Follow up your advisor
Spend time on regular literature review (reading papers)
Commit at least 2-3 hours per day for your master research
Look at your data before starting doing something
Follow “best” coding practices for research
Use version control
For versioning everything that is manually created
Backup your work on the cloud
Pham Quang Nhat Minh Research Methods in NLP 69/70
70. References
Alon, U. (2009). How to choose a good scientific problem.
Molecular cell, 35 6, 726-8.
Aruliah, D.A., Brown, C.T., Davis, M., Guy, R.T., Hong, N.P.,
Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D.,
Waugh, B., White, E.P., Wilson, G., & Wilson, P. (2014).
Best Practices for Scientific Computing. PLoS biology.
Ali Eslami. Patterns for Research in Machine Learning
http://arkitus.com/patterns-for-research-in-machine-learning
Pham Quang Nhat Minh Research Methods in NLP 70/70