SlideShare a Scribd company logo
1 of 69
Download to read offline
Research Methods in Natural Language Processing
Pham Quang Nhat Minh
Alt Vietnam Co., Ltd
pham.minh@alt.ai
March 17, 2018
Lecturer
Ph.D in Natural Language Processing
Now AI Researcher in Alt Vietnam Co., Ltd.
More than 9-year experience in R&D in both academic and
industry.
Research topics:
Information Extraction (named-entity recognition, relation
extraction,...)
Dialog systems (chatbot, seq2seq models, etc)
Pham Quang Nhat Minh Research Methods in NLP 2/69
Objectives of the lecture
Introduce some research know-how and practices in doing
research
Focus on NLP/Machine Learning/Data Science fields
Share my research experiences in the field NLP
Pham Quang Nhat Minh Research Methods in NLP 3/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 4/69
Acknowledgements
Many contents in the lecture are from documents in the
references
(Alon, 2009) How To Choose a Good Scientific Problem
(Wilson et al., 2012) Best Practices for Scientific Computing
Paul Cohen: Empirical Methods for AI & CS
Other documents, blogs
Pham Quang Nhat Minh Research Methods in NLP 5/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 6/69
What does “empirical” mean?
Relying on observations, data, experiments
Empirical work should complement theoretical work
Theories often have holes (e.g., How big is the constant term?)
Theories are suggested by observations
Theories are tested by observations
Conversely, theories direct our empirical attention
In addition, empirical means “wanting to understand
behaviour of complex systems”
In NLP, we may want to understand how features are
correlated
Pham Quang Nhat Minh Research Methods in NLP 7/69
Why we need empirical methods
Theory based science need not be all theorems
We do not know how a theory works in different conditions
Different data sets, domains
Pham Quang Nhat Minh Research Methods in NLP 8/69
Empirical methods in CS/AI
Data observation
Construct hypotheses
Test with empirical experiments
Refine hypotheses and modelling assumptions
Pham Quang Nhat Minh Research Methods in NLP 9/69
Kinds of data analysis
Exploratory (EDA) - looking for patterns in data
Statistical inferences from sample data
Testing hypotheses
Estimating parameters
Building mathematical models of datasets
Machine learning, data mining...
Pham Quang Nhat Minh Research Methods in NLP 10/69
Tools for data analysis
R programming language
Python:
numpy
scipy
pandas
matplotlib for data visualization
My biased opinions:
statisticians like R, computer scientists often use Python
Python is much easier to learn than R
Pham Quang Nhat Minh Research Methods in NLP 11/69
Exercises
Install R: https://www.r-project.org
Download the data file ex1data1.txt from:
http://tinyurl.com/m7bpp8d
The data file has two columns:
First column: the population of a city.
Second column: the profit of a food truck in that city.
In R terminal, try the plot code
df <- read.table("./ex1data1.txt", sep=",",
header=FALSE)
plot(df[,1], df[,2], xlab=‘‘Profit in
$10,000s’’, ylab=‘‘Population of City in
10,000s’’)
Pham Quang Nhat Minh Research Methods in NLP 12/69
R for data visualization
Pham Quang Nhat Minh Research Methods in NLP 13/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 14/69
Why do we need to choose a good research topic?
“Garbage in, garbage out” principle
You may work with a research topic for years
1 year for a master thesis
3 years or more for a Ph.D. dissertation
It is painful to do things that you feel uninteresting
Lack passion, motivations, ideas
Much frustration and bitterness
Pham Quang Nhat Minh Research Methods in NLP 15/69
What is a good research topic?
(Alon, 2009) Two Dimensions of Problem Choice
Feasibility: whether a problem is hard or easy
We can measure the feasibility as the expected time to
complete the project
Feasibility is a function of the skills of students/researchers
and of the technology in the lab.
Interest: the increase in knowledge expected from the project.
Pham Quang Nhat Minh Research Methods in NLP 16/69
Two-dimensional space of Problem Choice (1)
Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon,
2009)
Pham Quang Nhat Minh Research Methods in NLP 17/69
Two-dimensional space of Problem Choice (2)
Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon,
2009)
Pham Quang Nhat Minh Research Methods in NLP 18/69
What is a good research topic?
Are many people care about the topic?
Research community, your supervisors, industry demands
Are you really interested in the topic?
The topic should be interesting to you rather than to others
Good signs: “ideas and questions that come back again and
again to your mind for months or years.”
Pham Quang Nhat Minh Research Methods in NLP 19/69
How to choose a good research topic: steps by steps
Choose the broad (general) topic
E.g, Machine Translation
Draw a hierarchy of research topics, starting from the broad
topic
Review literature to look for gaps in previous work
Choose the focused topic
E.g., Phrase-based Machine Translation
Find gaps in previous work
Form research questions in the focused topic
From research questions, formulate the research problem
Pham Quang Nhat Minh Research Methods in NLP 20/69
Finding a research problem
Take your time to choose a good research topic
(Alon, 2009): Rule for new Ph.D. students and postdocs: “Do
not commit to a problem before 3 months have elapsed”
For master students, take 1-2 months for choosing the research
topic before your start the research project.
Join projects in your laboratory
Many research ideas for thesis are from projects you involved
Pham Quang Nhat Minh Research Methods in NLP 21/69
Developing your research ideas
Where do research ideas come from?
Observations
Data observations, data analysis, discover patterns in data
Reading papers, attending conferences, listening talks
Techniques, methods from other disciplines, fields
Imagine
Suggestions from your advisor
Pham Quang Nhat Minh Research Methods in NLP 22/69
Reading papers, attending conferences
Choose good and relevant papers. Consider:
Impact factors of the journal.
In the NLP field, choose papers from top conferences, journals
(ACL/NAACL/EMNLP/COLING)
The Top 10 NLP Conferences:
http://www.junglelightspeed.com/
the-top-10-nlp-conferences
Reputations of authors and their organizations
Not only readings, but criticizing papers and finding the gaps
Pham Quang Nhat Minh Research Methods in NLP 23/69
Techniques, methods from other fields
Expand your view, problem solving methodologies by regularly
reading articles in other fields.
An example is the task image captioning
We need to use techniques from both computer vision and
NLP.
Pham Quang Nhat Minh Research Methods in NLP 24/69
What happens after we choose a problem? (Alon, 2009)
Pham Quang Nhat Minh Research Methods in NLP 25/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 26/69
Two types of readings
Fast readings
Get and understand the basic ideas of the paper
Know the problems the paper attacks and how it solves that
Put the paper in the “big picture” of the field
Know what are differences between the paper and previous
work
We do “fast reading” much when we survey literature and
choose a broad topic
Deep readings
Understand the details of presented methods
Try to understand how the proposed method works
Criticize the paper and find its limitations
If you were the authors, how would you solve the problem?
Propose alternative methods?
We do “deep reading” much we look for a focused topic
Pham Quang Nhat Minh Research Methods in NLP 27/69
How to read a scientific paper (1)
Michael J. Hanson. Efficient Readings of Papers in Science and Technology: http://tinyurl.com/qdebynz
Pham Quang Nhat Minh Research Methods in NLP 28/69
How to read a scientific paper (2)
Decide what to read
Read title, abstract
Read it, file it, or skip it
Read for breath
What did they do
Skim introduction, headings, graphics, definitions, conclusions
and bibliography.
Consider the credibility.
How useful is it?
Decide whether to go on.
Pham Quang Nhat Minh Research Methods in NLP 29/69
How to read a scientific paper (3)
Read in depth
How did they do it?
Challenge their arguments.
Examine assumptions.
Examine methods.
Examine statistics.
Examine reasoning and conclusions.
How can I apply their approach to my work?
Take notes
Make notes as you read.
Highlight major points.
Note new terms and definitions.
Summarize tables and graphs.
Write a summary.
Pham Quang Nhat Minh Research Methods in NLP 30/69
Homework
Choose one scientific article that you want to read in depth, read,
take notes and explain ideas, methods presented in the paper to
other students in a simple way.
Notes: You should be able to answer 3 questions as follows.
What is the problem the paper attacked?
What are the differences between the paper and other existing
papers?
What are interesting points of the presented methods?
Pham Quang Nhat Minh Research Methods in NLP 31/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 32/69
Some basic rules
Your advisor is supposed to be very busy, so you should follow
up her/him
Schedule the meeting in advanced and ask for meeting
Keep regular meeting with your advisor
Usually weekly meeting
Do not just do what your advisor tell you to do
Rule of thumbnail: You should finish all your assigned tasks
before doing your own ideas
Pham Quang Nhat Minh Research Methods in NLP 33/69
How to write a progress/status report
Michael Ernst. Writing a progress/status report:
http://tinyurl.com/zp7cdvt
Quote the previous week’s plan.
This helps you determine whether you accomplished your goals.
State this week’s progress.
What you have accomplished,
What you learned, what difficulties you overcame, what
difficulties are still blocking you,
Your new ideas for research directions or projects, etc
Give the next week’s plan.
A good format is a bulleted list
Try to make each goal measurable: there should be no
ambiguity as to whether you were able to finish it.
It’s good to include longer-term goals as well.
Pham Quang Nhat Minh Research Methods in NLP 34/69
Communicate with your advisor
Prepare some slides (3-4 slides) to make the discussion
concrete
Send the materials at least 24 hours before the meeting day
Arrange the meeting in advanced
Your advisor is not always right
Actually you know more about your work than her/him
If you have data, evidences, proofs, do not hesitate to debate
Do not say “I guest”, “I think” when you explain something.
Use data, evidences, references instead
Pham Quang Nhat Minh Research Methods in NLP 35/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 36/69
What is Natural Language Processing?
A field of computer science, artificial intelligence, and
computational linguistics
To get computers to perform useful tasks involving human
languages
Human-Machine communication
Improving human-human communication
E.g Machine Translation
Extracting information from texts
Pham Quang Nhat Minh Research Methods in NLP 37/69
Why is NLP interesting?
Languages involve many human activities
Reading, writing, speaking, listening
Voice can be used as an user interface in many applications
Remote controls, virtual assistants like siri,...
NLP is used to acquire insights from massive amount of
textual data
E.g., hypotheses from medical, health reports
NLP has many applications
NLP is hard!
Pham Quang Nhat Minh Research Methods in NLP 38/69
NLP problems
Fundamental problems
Word Segmentation
Part-of-speech tagging
Syntactic Analysis
Semantic Analysis
Application problems
Information Retrieval
Information Extraction
Question Answering
Text Summarization
Machine Translation
Pham Quang Nhat Minh Research Methods in NLP 39/69
What is it like doing research in NLP?
Empirical methods are applied much in NLP
Relying on observations, data, experiments
Contains many loops of experiments
Identify the problem → Create ideas → Test the best idea →
Analyse results → Identify the problem → Create ideas → · · ·
Pham Quang Nhat Minh Research Methods in NLP 40/69
What is it like doing research in NLP?
Many ideas do not work
Even though, we need to analyse the results to understand
why they do not work to come up with new ideas.
Try the next idea
Fails occur more often than successes
Try to increase the number of experiments
(No of successes) = (No of experiments) × (Success rate)
Pham Quang Nhat Minh Research Methods in NLP 41/69
The typical working day of a NLP researcher
Data observation and data/result analysis (a lot)
Discuss ideas with colleagues
Do experiments (run the program) to test ideas
Reading papers to keep up-to-date on mainstream researches
Investigate new NLP/Machine Learning tools, libraries (less
regular)
Pham Quang Nhat Minh Research Methods in NLP 42/69
How to learn NLP?
Research starts from learning
Learn/review background about:
Probabilistic and Statistics
Basic math (linear algebra, calculus)
Machine Learning
Programming
Read NLP textbooks
Jurafsky, D., & Martin, J.H. Speech and Language Processing:
an Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition.
Manning, C.D., & Schutze, H. Foundations of statistical
natural language processing.
Pham Quang Nhat Minh Research Methods in NLP 43/69
How to learn NLP: Get your hands dirty
Practice with programming exercises:
100 NLP drill exercises: https://github.com/
minhpqn/nlp_100_drill_exercises
NLP Programming Tutorial, by Graham Neubig:
http://www.phontron.com/teaching.php
Compete in Kaggle data science challenges (kaggle.com)
Pham Quang Nhat Minh Research Methods in NLP 44/69
Finding a NLP research problem
All the principles in the section “How to choose a good
research topic” apply.
Looking for ideas from related fields
Linguistics
Machine learning: mainstream in the NLP field is applying
machine learning methods in the NLP problems
Computer vision
Looking at data
It is actually my daily task
Pham Quang Nhat Minh Research Methods in NLP 45/69
Basic rules to choose NLP papers
READ:
Papers in top conferences and journals in NLP and other
related fields
(ACL/EMNLP/NAACL/EACL/COLING/CoNLL/...)
Workshops that focus on an NLP sub-field
Short papers at top conferences
PhD dissertations from top institutions/advisors
Papers with many citations
Textbooks from leading researchers
For more information, see: The Top 10 NLP Conferences1
1
http://www.junglelightspeed.com/the-top-10-nlp-conferences/
Pham Quang Nhat Minh Research Methods in NLP 46/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 47/69
Why is coding important in NLP/ML research?
Many (most) NLP/ML research work is empirical studies
Need to do data analysis, run experiments to test our ideas
So, we have to write programs
Even theorists should program, too
“Implementing your own algorithm is a good way of checking
your work. If you aren’t implementing your algorithm,
arguably you’re skipping a key step in checking your results.”
—Michael Mitzenmacher
http://mybiasedcoin.blogspot.com/2008/11/bugs.html
Pham Quang Nhat Minh Research Methods in NLP 48/69
Why we care about coding practices in NLP research?
Bad coding practices cause problems
You find errors in the experimental results right before the
paper submission deadline
You cannot understand your own code after some months
You deleted intermediate results, so you cannot verify the code
You do not know the technique to verify experimental results
You did not test the code, and then use untested code for
experiments
You spend long time for refactoring the code
You could not get back the version that generate the best
results
...
Pham Quang Nhat Minh Research Methods in NLP 49/69
Why we care about coding practices in NLP research?
Good coding practices speed up our research work
Recall that:
(No of successes) = (No of experiments) × (Success rate)
Pham Quang Nhat Minh Research Methods in NLP 50/69
Best Practices for Scientific Computing
(Wilson et al., 2012)
1- Write programs for people, not computers.
Readers of the code do not need to remember too much
Easy to read: names should be consistent, distinctive, and
meaningful
Break down the coding work into one-hour-long tasks
2- Automate repetitive tasks
Scientists should rely on the computer to repeat tasks.
Should use a script to run program!!
Use a build tool to automate their scientific workflows
Pham Quang Nhat Minh Research Methods in NLP 51/69
Best Practices for Scientific Computing
3- Use the computer to record history
Unique identifiers and version numbers for raw data records
Unique identifiers and version number for programs and
libraries
The values of parameters used to generate any given output;
The names and version number of programs used to generate
those outputs.
4- Make incremental changes
Scientists can not know what their programs should do next
until the current version has produced some results.
Should work in small steps with frequent feedback and
correction!
Pham Quang Nhat Minh Research Methods in NLP 52/69
Best Practices for Scientific Computing
5- Use a version control system: git, mercural, subversion. Push
code to github, bitbucket
Everything that has been created manually should be put in
version control
6- Do not repeat yourself (or others)
At small-scale, code should be modularized rather than copied
and pasted.
At large-scale, scientific programmers should re-use code
instead of re-writing it.
Pham Quang Nhat Minh Research Methods in NLP 53/69
Best Practices for Scientific Computing
7- Plan for mistakes
Write and run tests
Unit Test: Check the correctness of each single software unit
Integration Test: Check that pieces of unit code work
correctly when combined.
Regression Test: Running pre-existing code tests after changes
to the code in order to make sure that it hasn’t regressed.
Should use off-the-self unit testing library
Pham Quang Nhat Minh Research Methods in NLP 54/69
Best Practices for Scientific Computing
8- Optimize software only after it works correctly
Use profiler to identify bottlenecks
Write code in the highest-level language possible
Python is recommended language for research
Only use low-level programming language when they are sure
that performance boost is needed.
Use the highest-level programming language for rapid
prototyping.
Pham Quang Nhat Minh Research Methods in NLP 55/69
9- Document design, and purpose, not mechanics
Document interface and reasons, not implementations
Do not do that
i = i + 1 # Increment the variable ’i’ by
one.
Refactor the code instead of explaining how it works
Embed the documentation for a piece of software in that
software
Use software to generate documentation.
10- Collaborate
Use pre-merge code reviews
Use an issue tracking tool.
Pham Quang Nhat Minh Research Methods in NLP 56/69
Coding practices for NLP/ML research
All general practices apply for NLP/ML research
Separate a process into small processes
Use pipelines in Unix/Linux
Make use of tools in experiments
Linux commands
NLP/ML Tools
Libraries (json, nltk, matplotlib, scikit-learn,...)
Algorithms
E.g., Show statistics about number of words in a text file
source file name.txt | cut -f1 | sort | uniq
-c | sort -nr
Visualize experimental results, make demo for your research
results
Pham Quang Nhat Minh Research Methods in NLP 57/69
Tool for visualizing research results
Tables (Microsoft Excel, HTML)
Charts (gnuplot, matplotlib, R)
Graphs (graphviz, Gephi, D3.js)
Texts (Microsoft Excel, HTML, brat2)
Codes (google-code-prettify3, Pygments4)
Demo (HTML, JavaScript, CSS,...)
2
http://brat.nlplab.org/
3
https://github.com/google/code-prettify
4
http://pygments.org/
Pham Quang Nhat Minh Research Methods in NLP 58/69
Optimize codes only after your ideas work
“Make it work. Make it right. Make it fast.” (Kent Beck)
“Premature optimization is the root of all evil (or at least
most of it) in programming.” (Donald Knuth)
In NLP, always start with a simple and dirty working version
E.g, Bag-of-word features and Naive Bayes algorithm in text
classification tasks
Pham Quang Nhat Minh Research Methods in NLP 59/69
Reproducible Research
Reproducible Research is a research work that you can
reproduce the reported results with available code (script) and
data.
Same data + Same script = Same results
Criteria for a truly reproducible study
All methods are fully reported.
All data and files used for the analysis are (publicly) available.
The process of analyzing raw data is well reported and
preserved.
Pham Quang Nhat Minh Research Methods in NLP 60/69
Why Reproducible Research?
In the first place, it will help us to reproduce figures,
statistics, etc when we revise the work.
Help other people who want to do research in the field
Make easier to compare a new method to existing methods.
Help to verify if the implementation is correct
Pham Quang Nhat Minh Research Methods in NLP 61/69
How to make your research reproducible?
Don’t do things by hands. Think how to do automatically
Downloading data from website
Data cleaning, preprocessing
...
For things that you cannot do it automatically, document the
process well!
Use version control (GitHub/Bitbucket)
Keep track of history
Allows backing to old versions
Keep track of software environment
In python use virtualenv or conda environment
Pham Quang Nhat Minh Research Methods in NLP 62/69
How to make your research reproducible: Tools
Jupyter notebook (http://jupyter.org/)
Put documents and code in the same place
Matplotlib or Seaborn for data visualization
Pandas for data processing
There are some equivalent tools for R:
knitr/rmarkdown, and RStudio
Pham Quang Nhat Minh Research Methods in NLP 63/69
Exercises
Install python, jupyter notebook
Install git
Make a GitHub account on your computer
Visualize the data from http://bit.ly/2ooMyaO on
Jupyter Notebook
Using pandas and matplotlib
Pham Quang Nhat Minh Research Methods in NLP 64/69
Table of Contents
1 What are empirical research methods for computer science?
2 How to choose a good research topic?
3 How to read a scientific paper?
4 How to work with your advisor
5 Doing research in NLP field
What is NLP?
What is it like doing research in NLP?
How to do research in NLP?
How to choose NLP papers to read?
6 Coding practices for NLP/Machine Learning research work
7 Summary
Pham Quang Nhat Minh Research Methods in NLP 65/69
Summary
Empirical research methods reply on observations, data,
experiments
Two dimensions of problem choice: Feasibility and Interest
Research starts from learning
Reading is very important in research
NLP research involves much data analysis
Coding practices for NLP/ML research
Pham Quang Nhat Minh Research Methods in NLP 66/69
Check-list for your master thesis
1 Is your work reproducible?
Package your code so that it can automatically generate the
results by a single script
Freeze the final version
2 Is your proposed method new
3 Did you revise your thesis many times?
Ask your advisors, friends for proof reading
4 Did you understand previous work?
5 Do you think you can pass the master thesis defense?
Pham Quang Nhat Minh Research Methods in NLP 67/69
Advices for your master thesis
Take time to choose your master research topic
Work on the research problem that you are interested in
Start soon
Follow up your advisor
Spend time on regular literature review (reading papers)
Commit at least 2-3 hours per day for your master research
Look at your data before starting doing something
Follow “best” coding practices for research
Use version control
For versioning everything that is manually created
Backup your work on the cloud
Pham Quang Nhat Minh Research Methods in NLP 68/69
References
Alon, U. (2009). How to choose a good scientific problem.
Molecular cell, 35 6, 726-8.
Aruliah, D.A., Brown, C.T., Davis, M., Guy, R.T., Hong, N.P.,
Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D.,
Waugh, B., White, E.P., Wilson, G., & Wilson, P. (2014).
Best Practices for Scientific Computing. PLoS biology.
Ali Eslami. Patterns for Research in Machine Learning
http://arkitus.com/patterns-for-research-in-machine-learning
Pham Quang Nhat Minh Research Methods in NLP 69/69

More Related Content

What's hot

Proposal defence format
Proposal defence formatProposal defence format
Proposal defence formatAdil Mehmoood
 
Writing Research Statement
Writing Research StatementWriting Research Statement
Writing Research StatementBrijesh Agrawal
 
How to do qualitative analysis: In theory and practice
How to do qualitative analysis: In theory and practice How to do qualitative analysis: In theory and practice
How to do qualitative analysis: In theory and practice Heather Ford
 
Capturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in SurveysCapturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in SurveysPerformance Solutions Corp.
 
Research Writing Survey
Research Writing SurveyResearch Writing Survey
Research Writing SurveyAiden Yeh
 
Business Research Methods session 6 research design
Business Research Methods session 6 research designBusiness Research Methods session 6 research design
Business Research Methods session 6 research designIan Cammack
 
Guidelines for preparing a research proposal
Guidelines for preparing a research proposalGuidelines for preparing a research proposal
Guidelines for preparing a research proposalKamarudin Jaafar
 
How To Write Your Research Proposal
How To Write Your Research ProposalHow To Write Your Research Proposal
How To Write Your Research Proposalresearchconsultant2
 
Lesson 4 secondary research 2
Lesson 4   secondary research 2Lesson 4   secondary research 2
Lesson 4 secondary research 2Kavita Parwani
 
Braun, Clarke & Hayfield Thematic Analysis Part 3
Braun, Clarke & Hayfield Thematic Analysis Part 3Braun, Clarke & Hayfield Thematic Analysis Part 3
Braun, Clarke & Hayfield Thematic Analysis Part 3Victoria Clarke
 
Qualitative Lab - Analysis And Report
Qualitative Lab - Analysis And ReportQualitative Lab - Analysis And Report
Qualitative Lab - Analysis And Reportnibraspk
 
Questionaire design
Questionaire designQuestionaire design
Questionaire designManoj Subedi
 
Information Highlighting
Information HighlightingInformation Highlighting
Information HighlightingTim Ostler
 
Qualitative data analysis - Student L
Qualitative data analysis - Student LQualitative data analysis - Student L
Qualitative data analysis - Student LLee Cox
 
Fourteen Steps To Writing An Effective Discussion Section
Fourteen Steps To Writing An Effective Discussion SectionFourteen Steps To Writing An Effective Discussion Section
Fourteen Steps To Writing An Effective Discussion Sectionchenv
 
Method of data collection and analysis based in Grounded Theory
Method of data collection and analysis based in Grounded TheoryMethod of data collection and analysis based in Grounded Theory
Method of data collection and analysis based in Grounded Theoryprayslide
 

What's hot (20)

Proposal defence format
Proposal defence formatProposal defence format
Proposal defence format
 
Writing Research Statement
Writing Research StatementWriting Research Statement
Writing Research Statement
 
How to write an effective title and abstract and choose appropriate keywords 
How to write an effective title and abstract and choose appropriate keywords How to write an effective title and abstract and choose appropriate keywords 
How to write an effective title and abstract and choose appropriate keywords 
 
How to do qualitative analysis: In theory and practice
How to do qualitative analysis: In theory and practice How to do qualitative analysis: In theory and practice
How to do qualitative analysis: In theory and practice
 
Capturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in SurveysCapturing and Analyzing Qualitative Data in Surveys
Capturing and Analyzing Qualitative Data in Surveys
 
Research Writing Survey
Research Writing SurveyResearch Writing Survey
Research Writing Survey
 
Business Research Methods session 6 research design
Business Research Methods session 6 research designBusiness Research Methods session 6 research design
Business Research Methods session 6 research design
 
Guidelines for preparing a research proposal
Guidelines for preparing a research proposalGuidelines for preparing a research proposal
Guidelines for preparing a research proposal
 
How To Write Your Research Proposal
How To Write Your Research ProposalHow To Write Your Research Proposal
How To Write Your Research Proposal
 
Research Proposal Seminar
Research Proposal SeminarResearch Proposal Seminar
Research Proposal Seminar
 
Lesson 4 secondary research 2
Lesson 4   secondary research 2Lesson 4   secondary research 2
Lesson 4 secondary research 2
 
Braun, Clarke & Hayfield Thematic Analysis Part 3
Braun, Clarke & Hayfield Thematic Analysis Part 3Braun, Clarke & Hayfield Thematic Analysis Part 3
Braun, Clarke & Hayfield Thematic Analysis Part 3
 
Lecture 2 pp02
Lecture 2 pp02Lecture 2 pp02
Lecture 2 pp02
 
Quality Research
Quality Research Quality Research
Quality Research
 
Qualitative Lab - Analysis And Report
Qualitative Lab - Analysis And ReportQualitative Lab - Analysis And Report
Qualitative Lab - Analysis And Report
 
Questionaire design
Questionaire designQuestionaire design
Questionaire design
 
Information Highlighting
Information HighlightingInformation Highlighting
Information Highlighting
 
Qualitative data analysis - Student L
Qualitative data analysis - Student LQualitative data analysis - Student L
Qualitative data analysis - Student L
 
Fourteen Steps To Writing An Effective Discussion Section
Fourteen Steps To Writing An Effective Discussion SectionFourteen Steps To Writing An Effective Discussion Section
Fourteen Steps To Writing An Effective Discussion Section
 
Method of data collection and analysis based in Grounded Theory
Method of data collection and analysis based in Grounded TheoryMethod of data collection and analysis based in Grounded Theory
Method of data collection and analysis based in Grounded Theory
 

Similar to Research Methods in Natural Language Processing (2018 version)

How to Read Academic Papers
How to Read Academic PapersHow to Read Academic Papers
How to Read Academic PapersJia-Bin Huang
 
Writing an effective Poster: the point of view of experts, novices and litera...
Writing an effective Poster: the point of view of experts, novices and litera...Writing an effective Poster: the point of view of experts, novices and litera...
Writing an effective Poster: the point of view of experts, novices and litera...Elisabetta Cigognini
 
WriteTEL: Session2
WriteTEL: Session2WriteTEL: Session2
WriteTEL: Session2nicwhitton
 
Chapter 12: Abstract ( english for writing research papers)
Chapter 12: Abstract ( english for writing research papers)Chapter 12: Abstract ( english for writing research papers)
Chapter 12: Abstract ( english for writing research papers)Hafiza Abas
 
Scientific writing session 2
Scientific writing session 2Scientific writing session 2
Scientific writing session 2Dr Rajeev Kumar
 
Systematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise OverviewSystematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise Overviewyoukayaslam
 
RESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdf
RESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdfRESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdf
RESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdfMATIULLAH JAN
 
20130928_Developing a Research Proposal.pdf
20130928_Developing a Research Proposal.pdf20130928_Developing a Research Proposal.pdf
20130928_Developing a Research Proposal.pdfOsmanAli92
 
journal-club-template.pdf
journal-club-template.pdfjournal-club-template.pdf
journal-club-template.pdfKojoDanquah4
 
Basic Social Math - Research Proposal
Basic Social Math - Research ProposalBasic Social Math - Research Proposal
Basic Social Math - Research ProposalJared Lee Hanson
 
RES 3024 Presentation 3a Understanding Academic Articles.ppsx
RES 3024 Presentation 3a Understanding Academic Articles.ppsxRES 3024 Presentation 3a Understanding Academic Articles.ppsx
RES 3024 Presentation 3a Understanding Academic Articles.ppsxMatthewLewis227954
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Minh Pham
 
Introduction to Thesis
Introduction to ThesisIntroduction to Thesis
Introduction to ThesisUltraman Taro
 

Similar to Research Methods in Natural Language Processing (2018 version) (20)

How to Read Academic Papers
How to Read Academic PapersHow to Read Academic Papers
How to Read Academic Papers
 
Writing an effective Poster: the point of view of experts, novices and litera...
Writing an effective Poster: the point of view of experts, novices and litera...Writing an effective Poster: the point of view of experts, novices and litera...
Writing an effective Poster: the point of view of experts, novices and litera...
 
WriteTEL: Session2
WriteTEL: Session2WriteTEL: Session2
WriteTEL: Session2
 
Chapter 12: Abstract ( english for writing research papers)
Chapter 12: Abstract ( english for writing research papers)Chapter 12: Abstract ( english for writing research papers)
Chapter 12: Abstract ( english for writing research papers)
 
Project writing
Project writingProject writing
Project writing
 
Scientific writing session 2
Scientific writing session 2Scientific writing session 2
Scientific writing session 2
 
Systematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise OverviewSystematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise Overview
 
On the way of reaseach
On the way of reaseachOn the way of reaseach
On the way of reaseach
 
RESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdf
RESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdfRESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdf
RESEARCH METHODOLOGY_ STEP BY STEP RESEARCH METHODOLOGY CHAPTER_.pdf
 
Detailed outline of Research Proposal
Detailed outline of  Research ProposalDetailed outline of  Research Proposal
Detailed outline of Research Proposal
 
20130928_Developing a Research Proposal.pdf
20130928_Developing a Research Proposal.pdf20130928_Developing a Research Proposal.pdf
20130928_Developing a Research Proposal.pdf
 
How to select your publications & who is who in research?: Impact & H factors
How to select your publications & who is who in research?: Impact & H factorsHow to select your publications & who is who in research?: Impact & H factors
How to select your publications & who is who in research?: Impact & H factors
 
journal-club-template.pdf
journal-club-template.pdfjournal-club-template.pdf
journal-club-template.pdf
 
Lecture 3-ScientificWriting.pptx
Lecture 3-ScientificWriting.pptxLecture 3-ScientificWriting.pptx
Lecture 3-ScientificWriting.pptx
 
Basic Social Math - Research Proposal
Basic Social Math - Research ProposalBasic Social Math - Research Proposal
Basic Social Math - Research Proposal
 
RES 3024 Presentation 3a Understanding Academic Articles.ppsx
RES 3024 Presentation 3a Understanding Academic Articles.ppsxRES 3024 Presentation 3a Understanding Academic Articles.ppsx
RES 3024 Presentation 3a Understanding Academic Articles.ppsx
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)
 
Sonja's power point on prior knowlege
Sonja's power point on prior knowlegeSonja's power point on prior knowlege
Sonja's power point on prior knowlege
 
Introduction to Thesis
Introduction to ThesisIntroduction to Thesis
Introduction to Thesis
 
Reports 101612
Reports 101612Reports 101612
Reports 101612
 

More from Minh Pham

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTMinh Pham
 
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...Minh Pham
 
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...Minh Pham
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIMLMinh Pham
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMinh Pham
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized RepresentationMinh Pham
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...Minh Pham
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017Minh Pham
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotMinh Pham
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 

More from Minh Pham (11)

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
 
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
 
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnames...
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIML
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized Representation
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 

Recently uploaded

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Research Methods in Natural Language Processing (2018 version)

  • 1. Research Methods in Natural Language Processing Pham Quang Nhat Minh Alt Vietnam Co., Ltd pham.minh@alt.ai March 17, 2018
  • 2. Lecturer Ph.D in Natural Language Processing Now AI Researcher in Alt Vietnam Co., Ltd. More than 9-year experience in R&D in both academic and industry. Research topics: Information Extraction (named-entity recognition, relation extraction,...) Dialog systems (chatbot, seq2seq models, etc) Pham Quang Nhat Minh Research Methods in NLP 2/69
  • 3. Objectives of the lecture Introduce some research know-how and practices in doing research Focus on NLP/Machine Learning/Data Science fields Share my research experiences in the field NLP Pham Quang Nhat Minh Research Methods in NLP 3/69
  • 4. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 4/69
  • 5. Acknowledgements Many contents in the lecture are from documents in the references (Alon, 2009) How To Choose a Good Scientific Problem (Wilson et al., 2012) Best Practices for Scientific Computing Paul Cohen: Empirical Methods for AI & CS Other documents, blogs Pham Quang Nhat Minh Research Methods in NLP 5/69
  • 6. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 6/69
  • 7. What does “empirical” mean? Relying on observations, data, experiments Empirical work should complement theoretical work Theories often have holes (e.g., How big is the constant term?) Theories are suggested by observations Theories are tested by observations Conversely, theories direct our empirical attention In addition, empirical means “wanting to understand behaviour of complex systems” In NLP, we may want to understand how features are correlated Pham Quang Nhat Minh Research Methods in NLP 7/69
  • 8. Why we need empirical methods Theory based science need not be all theorems We do not know how a theory works in different conditions Different data sets, domains Pham Quang Nhat Minh Research Methods in NLP 8/69
  • 9. Empirical methods in CS/AI Data observation Construct hypotheses Test with empirical experiments Refine hypotheses and modelling assumptions Pham Quang Nhat Minh Research Methods in NLP 9/69
  • 10. Kinds of data analysis Exploratory (EDA) - looking for patterns in data Statistical inferences from sample data Testing hypotheses Estimating parameters Building mathematical models of datasets Machine learning, data mining... Pham Quang Nhat Minh Research Methods in NLP 10/69
  • 11. Tools for data analysis R programming language Python: numpy scipy pandas matplotlib for data visualization My biased opinions: statisticians like R, computer scientists often use Python Python is much easier to learn than R Pham Quang Nhat Minh Research Methods in NLP 11/69
  • 12. Exercises Install R: https://www.r-project.org Download the data file ex1data1.txt from: http://tinyurl.com/m7bpp8d The data file has two columns: First column: the population of a city. Second column: the profit of a food truck in that city. In R terminal, try the plot code df <- read.table("./ex1data1.txt", sep=",", header=FALSE) plot(df[,1], df[,2], xlab=‘‘Profit in $10,000s’’, ylab=‘‘Population of City in 10,000s’’) Pham Quang Nhat Minh Research Methods in NLP 12/69
  • 13. R for data visualization Pham Quang Nhat Minh Research Methods in NLP 13/69
  • 14. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 14/69
  • 15. Why do we need to choose a good research topic? “Garbage in, garbage out” principle You may work with a research topic for years 1 year for a master thesis 3 years or more for a Ph.D. dissertation It is painful to do things that you feel uninteresting Lack passion, motivations, ideas Much frustration and bitterness Pham Quang Nhat Minh Research Methods in NLP 15/69
  • 16. What is a good research topic? (Alon, 2009) Two Dimensions of Problem Choice Feasibility: whether a problem is hard or easy We can measure the feasibility as the expected time to complete the project Feasibility is a function of the skills of students/researchers and of the technology in the lab. Interest: the increase in knowledge expected from the project. Pham Quang Nhat Minh Research Methods in NLP 16/69
  • 17. Two-dimensional space of Problem Choice (1) Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon, 2009) Pham Quang Nhat Minh Research Methods in NLP 17/69
  • 18. Two-dimensional space of Problem Choice (2) Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon, 2009) Pham Quang Nhat Minh Research Methods in NLP 18/69
  • 19. What is a good research topic? Are many people care about the topic? Research community, your supervisors, industry demands Are you really interested in the topic? The topic should be interesting to you rather than to others Good signs: “ideas and questions that come back again and again to your mind for months or years.” Pham Quang Nhat Minh Research Methods in NLP 19/69
  • 20. How to choose a good research topic: steps by steps Choose the broad (general) topic E.g, Machine Translation Draw a hierarchy of research topics, starting from the broad topic Review literature to look for gaps in previous work Choose the focused topic E.g., Phrase-based Machine Translation Find gaps in previous work Form research questions in the focused topic From research questions, formulate the research problem Pham Quang Nhat Minh Research Methods in NLP 20/69
  • 21. Finding a research problem Take your time to choose a good research topic (Alon, 2009): Rule for new Ph.D. students and postdocs: “Do not commit to a problem before 3 months have elapsed” For master students, take 1-2 months for choosing the research topic before your start the research project. Join projects in your laboratory Many research ideas for thesis are from projects you involved Pham Quang Nhat Minh Research Methods in NLP 21/69
  • 22. Developing your research ideas Where do research ideas come from? Observations Data observations, data analysis, discover patterns in data Reading papers, attending conferences, listening talks Techniques, methods from other disciplines, fields Imagine Suggestions from your advisor Pham Quang Nhat Minh Research Methods in NLP 22/69
  • 23. Reading papers, attending conferences Choose good and relevant papers. Consider: Impact factors of the journal. In the NLP field, choose papers from top conferences, journals (ACL/NAACL/EMNLP/COLING) The Top 10 NLP Conferences: http://www.junglelightspeed.com/ the-top-10-nlp-conferences Reputations of authors and their organizations Not only readings, but criticizing papers and finding the gaps Pham Quang Nhat Minh Research Methods in NLP 23/69
  • 24. Techniques, methods from other fields Expand your view, problem solving methodologies by regularly reading articles in other fields. An example is the task image captioning We need to use techniques from both computer vision and NLP. Pham Quang Nhat Minh Research Methods in NLP 24/69
  • 25. What happens after we choose a problem? (Alon, 2009) Pham Quang Nhat Minh Research Methods in NLP 25/69
  • 26. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 26/69
  • 27. Two types of readings Fast readings Get and understand the basic ideas of the paper Know the problems the paper attacks and how it solves that Put the paper in the “big picture” of the field Know what are differences between the paper and previous work We do “fast reading” much when we survey literature and choose a broad topic Deep readings Understand the details of presented methods Try to understand how the proposed method works Criticize the paper and find its limitations If you were the authors, how would you solve the problem? Propose alternative methods? We do “deep reading” much we look for a focused topic Pham Quang Nhat Minh Research Methods in NLP 27/69
  • 28. How to read a scientific paper (1) Michael J. Hanson. Efficient Readings of Papers in Science and Technology: http://tinyurl.com/qdebynz Pham Quang Nhat Minh Research Methods in NLP 28/69
  • 29. How to read a scientific paper (2) Decide what to read Read title, abstract Read it, file it, or skip it Read for breath What did they do Skim introduction, headings, graphics, definitions, conclusions and bibliography. Consider the credibility. How useful is it? Decide whether to go on. Pham Quang Nhat Minh Research Methods in NLP 29/69
  • 30. How to read a scientific paper (3) Read in depth How did they do it? Challenge their arguments. Examine assumptions. Examine methods. Examine statistics. Examine reasoning and conclusions. How can I apply their approach to my work? Take notes Make notes as you read. Highlight major points. Note new terms and definitions. Summarize tables and graphs. Write a summary. Pham Quang Nhat Minh Research Methods in NLP 30/69
  • 31. Homework Choose one scientific article that you want to read in depth, read, take notes and explain ideas, methods presented in the paper to other students in a simple way. Notes: You should be able to answer 3 questions as follows. What is the problem the paper attacked? What are the differences between the paper and other existing papers? What are interesting points of the presented methods? Pham Quang Nhat Minh Research Methods in NLP 31/69
  • 32. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 32/69
  • 33. Some basic rules Your advisor is supposed to be very busy, so you should follow up her/him Schedule the meeting in advanced and ask for meeting Keep regular meeting with your advisor Usually weekly meeting Do not just do what your advisor tell you to do Rule of thumbnail: You should finish all your assigned tasks before doing your own ideas Pham Quang Nhat Minh Research Methods in NLP 33/69
  • 34. How to write a progress/status report Michael Ernst. Writing a progress/status report: http://tinyurl.com/zp7cdvt Quote the previous week’s plan. This helps you determine whether you accomplished your goals. State this week’s progress. What you have accomplished, What you learned, what difficulties you overcame, what difficulties are still blocking you, Your new ideas for research directions or projects, etc Give the next week’s plan. A good format is a bulleted list Try to make each goal measurable: there should be no ambiguity as to whether you were able to finish it. It’s good to include longer-term goals as well. Pham Quang Nhat Minh Research Methods in NLP 34/69
  • 35. Communicate with your advisor Prepare some slides (3-4 slides) to make the discussion concrete Send the materials at least 24 hours before the meeting day Arrange the meeting in advanced Your advisor is not always right Actually you know more about your work than her/him If you have data, evidences, proofs, do not hesitate to debate Do not say “I guest”, “I think” when you explain something. Use data, evidences, references instead Pham Quang Nhat Minh Research Methods in NLP 35/69
  • 36. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 36/69
  • 37. What is Natural Language Processing? A field of computer science, artificial intelligence, and computational linguistics To get computers to perform useful tasks involving human languages Human-Machine communication Improving human-human communication E.g Machine Translation Extracting information from texts Pham Quang Nhat Minh Research Methods in NLP 37/69
  • 38. Why is NLP interesting? Languages involve many human activities Reading, writing, speaking, listening Voice can be used as an user interface in many applications Remote controls, virtual assistants like siri,... NLP is used to acquire insights from massive amount of textual data E.g., hypotheses from medical, health reports NLP has many applications NLP is hard! Pham Quang Nhat Minh Research Methods in NLP 38/69
  • 39. NLP problems Fundamental problems Word Segmentation Part-of-speech tagging Syntactic Analysis Semantic Analysis Application problems Information Retrieval Information Extraction Question Answering Text Summarization Machine Translation Pham Quang Nhat Minh Research Methods in NLP 39/69
  • 40. What is it like doing research in NLP? Empirical methods are applied much in NLP Relying on observations, data, experiments Contains many loops of experiments Identify the problem → Create ideas → Test the best idea → Analyse results → Identify the problem → Create ideas → · · · Pham Quang Nhat Minh Research Methods in NLP 40/69
  • 41. What is it like doing research in NLP? Many ideas do not work Even though, we need to analyse the results to understand why they do not work to come up with new ideas. Try the next idea Fails occur more often than successes Try to increase the number of experiments (No of successes) = (No of experiments) × (Success rate) Pham Quang Nhat Minh Research Methods in NLP 41/69
  • 42. The typical working day of a NLP researcher Data observation and data/result analysis (a lot) Discuss ideas with colleagues Do experiments (run the program) to test ideas Reading papers to keep up-to-date on mainstream researches Investigate new NLP/Machine Learning tools, libraries (less regular) Pham Quang Nhat Minh Research Methods in NLP 42/69
  • 43. How to learn NLP? Research starts from learning Learn/review background about: Probabilistic and Statistics Basic math (linear algebra, calculus) Machine Learning Programming Read NLP textbooks Jurafsky, D., & Martin, J.H. Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Manning, C.D., & Schutze, H. Foundations of statistical natural language processing. Pham Quang Nhat Minh Research Methods in NLP 43/69
  • 44. How to learn NLP: Get your hands dirty Practice with programming exercises: 100 NLP drill exercises: https://github.com/ minhpqn/nlp_100_drill_exercises NLP Programming Tutorial, by Graham Neubig: http://www.phontron.com/teaching.php Compete in Kaggle data science challenges (kaggle.com) Pham Quang Nhat Minh Research Methods in NLP 44/69
  • 45. Finding a NLP research problem All the principles in the section “How to choose a good research topic” apply. Looking for ideas from related fields Linguistics Machine learning: mainstream in the NLP field is applying machine learning methods in the NLP problems Computer vision Looking at data It is actually my daily task Pham Quang Nhat Minh Research Methods in NLP 45/69
  • 46. Basic rules to choose NLP papers READ: Papers in top conferences and journals in NLP and other related fields (ACL/EMNLP/NAACL/EACL/COLING/CoNLL/...) Workshops that focus on an NLP sub-field Short papers at top conferences PhD dissertations from top institutions/advisors Papers with many citations Textbooks from leading researchers For more information, see: The Top 10 NLP Conferences1 1 http://www.junglelightspeed.com/the-top-10-nlp-conferences/ Pham Quang Nhat Minh Research Methods in NLP 46/69
  • 47. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 47/69
  • 48. Why is coding important in NLP/ML research? Many (most) NLP/ML research work is empirical studies Need to do data analysis, run experiments to test our ideas So, we have to write programs Even theorists should program, too “Implementing your own algorithm is a good way of checking your work. If you aren’t implementing your algorithm, arguably you’re skipping a key step in checking your results.” —Michael Mitzenmacher http://mybiasedcoin.blogspot.com/2008/11/bugs.html Pham Quang Nhat Minh Research Methods in NLP 48/69
  • 49. Why we care about coding practices in NLP research? Bad coding practices cause problems You find errors in the experimental results right before the paper submission deadline You cannot understand your own code after some months You deleted intermediate results, so you cannot verify the code You do not know the technique to verify experimental results You did not test the code, and then use untested code for experiments You spend long time for refactoring the code You could not get back the version that generate the best results ... Pham Quang Nhat Minh Research Methods in NLP 49/69
  • 50. Why we care about coding practices in NLP research? Good coding practices speed up our research work Recall that: (No of successes) = (No of experiments) × (Success rate) Pham Quang Nhat Minh Research Methods in NLP 50/69
  • 51. Best Practices for Scientific Computing (Wilson et al., 2012) 1- Write programs for people, not computers. Readers of the code do not need to remember too much Easy to read: names should be consistent, distinctive, and meaningful Break down the coding work into one-hour-long tasks 2- Automate repetitive tasks Scientists should rely on the computer to repeat tasks. Should use a script to run program!! Use a build tool to automate their scientific workflows Pham Quang Nhat Minh Research Methods in NLP 51/69
  • 52. Best Practices for Scientific Computing 3- Use the computer to record history Unique identifiers and version numbers for raw data records Unique identifiers and version number for programs and libraries The values of parameters used to generate any given output; The names and version number of programs used to generate those outputs. 4- Make incremental changes Scientists can not know what their programs should do next until the current version has produced some results. Should work in small steps with frequent feedback and correction! Pham Quang Nhat Minh Research Methods in NLP 52/69
  • 53. Best Practices for Scientific Computing 5- Use a version control system: git, mercural, subversion. Push code to github, bitbucket Everything that has been created manually should be put in version control 6- Do not repeat yourself (or others) At small-scale, code should be modularized rather than copied and pasted. At large-scale, scientific programmers should re-use code instead of re-writing it. Pham Quang Nhat Minh Research Methods in NLP 53/69
  • 54. Best Practices for Scientific Computing 7- Plan for mistakes Write and run tests Unit Test: Check the correctness of each single software unit Integration Test: Check that pieces of unit code work correctly when combined. Regression Test: Running pre-existing code tests after changes to the code in order to make sure that it hasn’t regressed. Should use off-the-self unit testing library Pham Quang Nhat Minh Research Methods in NLP 54/69
  • 55. Best Practices for Scientific Computing 8- Optimize software only after it works correctly Use profiler to identify bottlenecks Write code in the highest-level language possible Python is recommended language for research Only use low-level programming language when they are sure that performance boost is needed. Use the highest-level programming language for rapid prototyping. Pham Quang Nhat Minh Research Methods in NLP 55/69
  • 56. 9- Document design, and purpose, not mechanics Document interface and reasons, not implementations Do not do that i = i + 1 # Increment the variable ’i’ by one. Refactor the code instead of explaining how it works Embed the documentation for a piece of software in that software Use software to generate documentation. 10- Collaborate Use pre-merge code reviews Use an issue tracking tool. Pham Quang Nhat Minh Research Methods in NLP 56/69
  • 57. Coding practices for NLP/ML research All general practices apply for NLP/ML research Separate a process into small processes Use pipelines in Unix/Linux Make use of tools in experiments Linux commands NLP/ML Tools Libraries (json, nltk, matplotlib, scikit-learn,...) Algorithms E.g., Show statistics about number of words in a text file source file name.txt | cut -f1 | sort | uniq -c | sort -nr Visualize experimental results, make demo for your research results Pham Quang Nhat Minh Research Methods in NLP 57/69
  • 58. Tool for visualizing research results Tables (Microsoft Excel, HTML) Charts (gnuplot, matplotlib, R) Graphs (graphviz, Gephi, D3.js) Texts (Microsoft Excel, HTML, brat2) Codes (google-code-prettify3, Pygments4) Demo (HTML, JavaScript, CSS,...) 2 http://brat.nlplab.org/ 3 https://github.com/google/code-prettify 4 http://pygments.org/ Pham Quang Nhat Minh Research Methods in NLP 58/69
  • 59. Optimize codes only after your ideas work “Make it work. Make it right. Make it fast.” (Kent Beck) “Premature optimization is the root of all evil (or at least most of it) in programming.” (Donald Knuth) In NLP, always start with a simple and dirty working version E.g, Bag-of-word features and Naive Bayes algorithm in text classification tasks Pham Quang Nhat Minh Research Methods in NLP 59/69
  • 60. Reproducible Research Reproducible Research is a research work that you can reproduce the reported results with available code (script) and data. Same data + Same script = Same results Criteria for a truly reproducible study All methods are fully reported. All data and files used for the analysis are (publicly) available. The process of analyzing raw data is well reported and preserved. Pham Quang Nhat Minh Research Methods in NLP 60/69
  • 61. Why Reproducible Research? In the first place, it will help us to reproduce figures, statistics, etc when we revise the work. Help other people who want to do research in the field Make easier to compare a new method to existing methods. Help to verify if the implementation is correct Pham Quang Nhat Minh Research Methods in NLP 61/69
  • 62. How to make your research reproducible? Don’t do things by hands. Think how to do automatically Downloading data from website Data cleaning, preprocessing ... For things that you cannot do it automatically, document the process well! Use version control (GitHub/Bitbucket) Keep track of history Allows backing to old versions Keep track of software environment In python use virtualenv or conda environment Pham Quang Nhat Minh Research Methods in NLP 62/69
  • 63. How to make your research reproducible: Tools Jupyter notebook (http://jupyter.org/) Put documents and code in the same place Matplotlib or Seaborn for data visualization Pandas for data processing There are some equivalent tools for R: knitr/rmarkdown, and RStudio Pham Quang Nhat Minh Research Methods in NLP 63/69
  • 64. Exercises Install python, jupyter notebook Install git Make a GitHub account on your computer Visualize the data from http://bit.ly/2ooMyaO on Jupyter Notebook Using pandas and matplotlib Pham Quang Nhat Minh Research Methods in NLP 64/69
  • 65. Table of Contents 1 What are empirical research methods for computer science? 2 How to choose a good research topic? 3 How to read a scientific paper? 4 How to work with your advisor 5 Doing research in NLP field What is NLP? What is it like doing research in NLP? How to do research in NLP? How to choose NLP papers to read? 6 Coding practices for NLP/Machine Learning research work 7 Summary Pham Quang Nhat Minh Research Methods in NLP 65/69
  • 66. Summary Empirical research methods reply on observations, data, experiments Two dimensions of problem choice: Feasibility and Interest Research starts from learning Reading is very important in research NLP research involves much data analysis Coding practices for NLP/ML research Pham Quang Nhat Minh Research Methods in NLP 66/69
  • 67. Check-list for your master thesis 1 Is your work reproducible? Package your code so that it can automatically generate the results by a single script Freeze the final version 2 Is your proposed method new 3 Did you revise your thesis many times? Ask your advisors, friends for proof reading 4 Did you understand previous work? 5 Do you think you can pass the master thesis defense? Pham Quang Nhat Minh Research Methods in NLP 67/69
  • 68. Advices for your master thesis Take time to choose your master research topic Work on the research problem that you are interested in Start soon Follow up your advisor Spend time on regular literature review (reading papers) Commit at least 2-3 hours per day for your master research Look at your data before starting doing something Follow “best” coding practices for research Use version control For versioning everything that is manually created Backup your work on the cloud Pham Quang Nhat Minh Research Methods in NLP 68/69
  • 69. References Alon, U. (2009). How to choose a good scientific problem. Molecular cell, 35 6, 726-8. Aruliah, D.A., Brown, C.T., Davis, M., Guy, R.T., Hong, N.P., Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D., Waugh, B., White, E.P., Wilson, G., & Wilson, P. (2014). Best Practices for Scientific Computing. PLoS biology. Ali Eslami. Patterns for Research in Machine Learning http://arkitus.com/patterns-for-research-in-machine-learning Pham Quang Nhat Minh Research Methods in NLP 69/69