Beginning students who start doing research may face to many difficulties from choosing a good research topic to start, how to develop new ideas to how to implement models to test their ideas and write papers. Research skill is a craft skill. You only learn it by doing. However, it is good to learn know-how in doing research. In this lecture, I share information of how-to-do research for engineering students with the hope that it will help students to save time at the beginning state of doing research.
Plant propagation: Sexual and Asexual propapagation.pptx
Research methods for engineering students (v.2020)
1. Research methods for engineering
students/researchers
Phạm Quang Nhật Minh
Aimesoft JSC
minhpham0902@gmail.com
January 11, 2020
2. Lecture contents
2
n What is (scientific) research?
n How to do research
n Working in NLP field
n Some advices for master students
n Summary
3. What is (scientific) research?
3
n Are following activities doing scientific research?
n Such information-gathering activities alone are not
scientific research
Problem Research
1 To settle a bet, I need to know
when Michael Jordan was born.
You Google “Michael Jordan birthday.”
2 I’m just curious about a new
species of fish.
You search the Internet for articles in
newspapers and academic journals.
3 I want to know good Chinese
restaurants near my home.
Search Google Maps and check Chinese
restaurants new my location
4 What are the age, sex and subject
distributions of doctoral students
in British higher education?
Get statistics information from all education
organizations, plot the distribution
Sampling from population
4. What is (scientific) research?
4
n Is scientific research finding out something you don’t
know?
n The definition is too wide and too narrow
¨ Too narrow: “a lot of research is concerned not with
finding out something you don’t know but with finding
that you don’t know something.”
n Just answering what- questions or doing information
gathering is not research
¨ Although answering what questions is very important in
doing research…
5. Research – the “why” questions
5
n Research goes beyond description and requires
analysis.
¨ It looks for explanations, relationships, comparisons,
predictions, generalizations and theories.
n They are why questions
¨ Why are there so many fewer women doctoral students in
physics than in biology?
¨ Why are the radiation levels different in different
geographical areas?
n Answering why questions requires information-
gathering (answering what questions)
6. Doing research vs. learning
6
Doing research Learning
• Answering questions whose
answers are unknown
• We even do not know answers
for questions exist or not
• E.g., is there life outside
earth?
• Sometimes, we try to answer
questions that are meaningful to
some people but meaningless for
others
• Research is a never-ending story
• Answering by yourself
questions whose answers were
already known
7. Why we do research?
7
n Research results can lead to products that totally
change our life
¨ E.g., Internet, new drugs, materials
n Answering questions whose answers are unknown,
is a fun intellectual activity
¨ We can experience “Eureka” moments
n Continuously attacking questions whose
answers are unknown or testing
existing answers is human nature
8. Research career
8
n Typical academic career path
¨ Acquiring Master/PhD
¨ Doing post-doc for 3-5 years
¨ Assistant Professor → Associate Professor → Full
Professor, or
¨ Settle down with a research position in a research
institute
n Academic job market is very competitive, so many
PhD holders go to engineering companies working as
a researcher or research engineer
9. Types of research (by Estelle Phillips)
9
n Exploratory research
¨ “This is the type of research that is involved in tackling a
new problem/issue/topic about which little is known, so
the research idea cannot at the beginning be formulated
very well.”
n Testing-out research
¨ “In this type of research we are trying to find the limits of
a previously proposed generalization.”
n Problem-solving research
¨ “In this type of research, we start from a particular
problem in the real world and bring together all the
intellectual resources that can be brought to bear on its
solution.”
10. Empirical research in computer science
10
n Relying on observations, data, experiments
n Empirical work should complement theoretical work
¨ Theories often have holes (e.g., How big is the constant
term?)
¨ Theories are suggested by observations
¨ Theories are tested by observations
¨ Conversely, theories direct our empirical attention
n In addition, empirical means “wanting to understand
behavior of complex systems”
11. Why we need empirical methods?
11
n Theory based science need not be all theorems
n We do not know how a theory works in different
conditions
¨ Different data sets, domains
12. Empirical methods in AI field
12
n Four steps:
¨ We do data analysis to find patterns in data and to
construct hypothesis
¨ We do empirical experiment to test hypotheses
¨ Then, we analyze results to refine hypothesis &
assumption
Data
observation
Hypothesis
construction
Hypothesis
testing
Refine
hypothesis &
modeling
assumptions
13. Lecture contents
13
n What is (scientific) research?
n How to do research
n Working in NLP field
n Some advices for master students
n Summary
14. How to do research
14
n Doing research is a craft skill which we learn by
doing
n We often learn research skills by seeing and
imitating how other accomplished researchers do
research
n But, understanding know-how in doing research is
necessary
15. How to do research
15
n Research skills covered in the lecture
¨ How to choose a research topic
¨ Developing ideas
¨ How to select your supervisor
¨ Literature review
¨ How to write a scientific paper
¨ Coding practices in research
16. Why we need to choose a good research topic?
16
n “Garbage in, garbage out”
n It is painful to do things that you feel uninteresting
¨ Lack passion, motivations, ideas
¨ Much frustration and bitterness
17. What is a good research topic?
17
n Two Dimensions of Problem Choice [Alon, 2009]
¨ Feasibility: whether a problem is hard or easy
n We can measure the feasibility as the expected time to
complete the project.
n Feasibility is a function of the skills of
students/researchers and of the technology in the lab.
¨ Interest: the increase in knowledge expected from the
project.
18. Two-dimensional space of Problem Choice
18
Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon, 2009)
19. Two-dimensional space of Problem Choice
19
Figure: The Feasibility-Interest Diagram for Choosing a Project (Alon, 2009)
20. What is a good research topic?
20
n Are many people care about the topic?
¨ Research community, your supervisors, industry demands
¨ Are you really interested in the topic?
¨ The topic should be interesting to you rather than to
others
n Good signs: “ideas and questions that come back
again and again to your mind for months or years.”
21. Does your advisor assign a research topic?
21
n It depends on the supervision style of your advisor
n Supervision styles:
¨ Your advisor let you choose the topic by yourself
n He/she just gives comments on your choice
¨ He/she gives you a general direction, you need to narrow
down it
n Clarify, investigate details
¨ Your advisor let you choose among some topics that he
prepared
¨ Your advisor prepares the topic and assigns to you
n Which supervision style do you like? Why?
22. Narrow down your research topic (1)
22
n Choose the broad (general) topic
¨ E.g., Machine Translation
n Draw a hierarchy of research topics, starting from
the broad topic
23. Narrow down your research topic (2)
23
n Review literature to look for gaps in previous work
n Choose the focused topic
¨ E.g., Phrase-based Machine Translation
n Find gaps in previous work
n Form research questions in the focused topic
n From research questions, formulate the research
problem
24. Finding a research problem
24
n Take your time to choose a good research topic
¨ (Alon, 2009): Rule for new Ph.D. students and postdocs:
“Do not commit to a problem before 3 months have
elapsed”
¨ For master students, take 1-2 months for choosing the
research topic before your start the research project
n Join projects in your laboratory
¨ Many research ideas for thesis are from projects you
involved
25. Developing research ideas
25
Where do research ideas come from?
n Observations
¨ Data observations, data analysis, discover patterns in data
n Reading papers, listening talks, talk to others
n Techniques, methods from other disciplines, fields
¨ Creativity is a combination of existing things in a novel way
26. How to develop research ideas
26
n Investigate main streams in the field
¨ Reading papers in top journal/conferences
n Revisit old problem where assumptions may have
changed
n Look for pain points; eliminate them
n Specialize the general
¨ E.g., routing problem in wireless networks
n Generalize the specific
¨ If a method in NLP can be applied to a wide class of
problems
27. How to develop research ideas
27
n Consider related problems, make analogies
n Just get started, with anything
¨ Find the implementation of a method presented in the
paper, try to reproduce the results
¨ Or, re-implement the baseline methods
¨ Look at the data
n Relax, and let your subconscious work
¨ New ideas often emerge when researchers
taking a bath or walking
¨ The important point is to get your body and mind relaxed
and escape from any distractions (phone, emails,…)
28. How to develop research ideas
28
n Prepare for changes in developing research ideas
29. How to select your supervisor
29
n The most important step in graduate level, especially
in PhD program
n “The key factor is whether they have an established
research record and are continuing to contribute to
the development of their discipline”
n Questions that you need to ask yourself
¨ Have they published research papers recently?
¨ Do they hold research grants or contracts?
¨ Is the lab efficiently organized?
30. How to select your supervisor
30
n Check basic information in their laboratory websites
¨ Research directions
¨ Recent publications
¨ Staff and student members
¨ How are graduated students doing?
n Make an appointment to talk with professors
¨ Ask them about supervision styles
¨ Is it easy to communicate with them?
31. How to select your supervisor
31
n Talk to other members in the laboratory
¨ You may get honest answers from other students
¨ You will work daily with other students and staffs rather
than your professor
n Ask yourself whether you are interested in research
directions of a professor
n Think whether your characteristic matches with the
supervision style (of the supervisor) and
environment in the candidate laboratory
32. How to work with your advisor
32
n Your advisor is supposed to be very busy, so you
should follow up her/him
n Schedule the meeting in advanced and ask for
meeting
n Keep regular meeting with your advisor
¨ Weekly or Bi-weekly meeting is OK
n You should finish all your assigned tasks before
doing your own ideas
¨ If you think ideas assigned by your professor will not work,
you should show data
33. Writing a progress report (Michael Ernst)
33
n Quote the previous week's plan.
¨ This helps you determine whether you accomplished your goals.
n State this week's progress.
¨ What you have accomplished,
¨ What you learned, what difficulties you overcame, what
difficulties are still blocking you,
¨ Your new ideas for research directions or projects, etc.
n Give the next week's plan.
¨ A good format is a bulleted list
¨ Try to make each goal measurable: there should be no ambiguity
as to whether you were able to finish it.
¨ It's good to include longer-term goals as well
34. How to communicate with your supervisor
34
n Prepare some slides (3-4 slides) to make the
discussion concrete
n Send the materials at least 24 hours prior the
meeting day
n Arrange the meeting in advanced
n Your advisor is not always right
¨ You know more about your work than her/him
¨ Show data, evidences, proofs
n Do not say “I guess”, “I think” for your claims
¨ Use data, evidences, references instead
36. How to do literature review
36
n Do it early!
n Why?
¨ Make sure that what you’re doing hasn’t already been
done before.
n If it has, and the paper is easy to find, you won’t get
full credit.
¨ Learn about common methods, datasets, and libraries
that will make your life easier.
¨ Buy yourself more time to think about the questions that
haven’t been answered in the literature.
37. Choosing papers to read
37
1. Do a keyword search on Google Scholar, Semantic
Scholar (or famous journals/conferences in a field)
2. Download the papers that seem most relevant
3. Skim the abstracts, intros, previous work sections
4. Identify papers that look relevant, appear often,
have lots of citations on Google Scholar
5. Download those papers
6. Return to step 3
38. Choosing papers to read
38
n Which papers will be most useful?
¨ Newer ones, especially if they cite the older papers that
you’re interested in.
n The newer paper might contain a good summary of the
older one!
n Papers published in top conferences and journals,
rather than arXiv papers or papers published
elsewhere.
¨ Reviewers have carefully looked at these papers for
mistakes or inconsistencies.
39. Choosing papers to read
39
n Published papers with negative results (method X
doesn't work, method X doesn't do what you think it
does, ...), rather than papers with positive results.
¨ Negative results are usually held to a higher standard in
order to be published
40. How to read a paper
40
n Two types of reading
Fast reading Deep reading
• Get and understand the basic
ideas of the paper
• Know the problems the paper
attacks and how it solves that
• Put the paper in the “big
picture” of the field
• Know what are differences
between the paper and
previous work
• Know what are differences
between the paper and
previous work
• Understand the details of
presented methods
• Try to understand how the
proposed method works
• Criticize the paper and find its
limitations
• Try to propose alternative
methods?
• We do “deep reading” much
we look for a focused topic
41. How to read a scientific paper*
41
* Michael J. Hanson. Efficient Readings of Papers in Science and Technology:
http://tinyurl.com/qdebynz
Decide what to read
Read title, abstract
Read it, file it, or skip it
42. Read in depth
42
n How did they do it?
n Challenge their arguments.
n Examine assumptions.
n Examine methods.
n Examine statistics.
n Examine reasoning and conclusions.
n How can I apply their approach to my work?
43. Taking notes after reading
43
n Taking notes helps you to understand paper more
deeply and save your time if you want to refer the
paper in the future
n How to take notes
¨ Make notes as you read.
¨ Highlight major points.
¨ Note new terms and definitions.
¨ Make your own examples
¨ Summarize tables and graphs.
¨ Write a summary.
44. Taking notes after reading
44
n If you do not time, take notes to answer three
questions
¨ What is the problem the paper attacked?
¨ What are the differences between the paper and other
existing papers?
¨ What are interesting points of the presented methods?
45. Useful tools/links for literature review
45
n Literature management
¨ Mendeley
¨ Zoreto
¨ Wisdom
¨ ReadCube
¨ Endnote
n Sites for search
¨ https://scholar.google.co.uk
¨ https://www.semanticscholar.org
¨ https://arxiv.org/
¨ http://www.arxiv-sanity.com/
46. Useful tools/links for literature review
46
n Tools for taking notes/writing summary
¨ Google Spreadsheet, Google Keep, etc.
¨ Evernotes
¨ ShortScience: https://www.shortscience.org
n In Machine Learning field
¨ https://paperswithcode.com
¨ https://sotabench.com/
¨ https://www.stateoftheart.ai
¨ http://nlpprogress.com
¨ https://www.aclweb.org/anthology
47. How to write a scientific paper
47
n Many articles about this topic
¨ How to write a technical paper, by Michael Ernst
n Vietnamese translation: http://bit.ly/2seEOi0
¨ How to write a great research paper, by Simon Payton
Jones.
¨ How to Read/Write an International Conference Paper, by
Graham Neubig
n Your paper needs to convince the audience of three
key points:
¨ The problem is interesting
¨ It is hard, and
¨ You solved it.
48. Why is coding important in ML research?
48
n Many (most) NLP/ML research work is empirical
studies
¨ Need to do data analysis, run experiments to test our ideas
¨ So, we must write programs
n Even theorists should program, too
n ”Implementing your own algorithm is a good way of
checking your work. If you aren’t implementing your
algorithm, arguably you’re skipping a key step in
checking your results.” (Michael Mitzenmacher)
¨ http://mybiasedcoin.blogspot.com/2008/11/bugs.html
49. Why we care about coding practices in research?
49
n Bad coding practices cause problems
¨ You find errors in the experimental results right before the
paper submission deadline
¨ You cannot understand your own code after some months
¨ You deleted intermediate results, so you cannot verify the
code
50. Why we care about coding practices in research?
50
n Bad coding practices cause problems
¨ You do not know the technique to verify experimental
results
¨ You did not test the code, and then use untested code for
experiments
¨ You spend long time for refactoring the code
¨ You could not get back the version that generate the best
results
51. Why we care about coding practices in research?
51
n Good coding practices speed up our research work
No of sucess = No of experiments × (Sucess rate)
n Increasing success rate is much harder to improve
than increase the number of experiments!
52. Best Practices for Scientific Computing
52
n Wilson et al., 2012: https://arxiv.org/abs/1210.0530
n The paper presents a set of practices for scientific
computing
53. Best Practices for Scientific Computing
53
1. Write programs for people, not computers
¨ Readers of the code do not need to remember too much
n Limit the total number of items to be remembered to
accomplish a task
¨ Names should be consistent, distinctive, and meaningful
¨ Make code style and formatting consistent
def rect_area(x1, y1, x2, y2):
...calculation...
def rect_area(point1, point2):
...calculation...
54. Best Practices for Scientific Computing
54
2. Automate repetitive tasks
¨ Make the computer repeat tasks
n Write a script (e.g., shell script) to automate
experiments including multiple steps (preprocessing,
training models, etc.)
¨ Save recent commands in a file for re-use (using
history command)
¨ Use a build tool to automate workflows
¨ Use logging tools to save experimental process
55. Best Practices for Scientific Computing
55
3. Make incremental changes
¨ Why?
n Requirements in scientific computing are rarely
frozen
n Scientists often can’t know what their programs
should do next until the current version has produced
some results.
56. Best Practices for Scientific Computing
56
3. Make incremental changes
¨ Work in small steps with frequent feedback and course
correction
¨ Use a version control systems
n put everything that has been created manually in
version control
57. Best Practices for Scientific Computing
57
4. Don’t repeat yourself (or others)
¨ For data, every piece of data must have a single
authoritative representation in the system
¨ Modularize code rather than copying and pasting
¨ Re-use code instead of rewriting it
n It is better to find established library or package to
solve your problem rather than rewriting
58. Best Practices for Scientific Computing
58
5. Plan for mistakes
¨ Why?
n Behaviors of research code should match researchers’
expectation
n Mistakes in research code may mislead conclusions
59. Best Practices for Scientific Computing
59
5. Plan for mistakes
¨ Add assertions to programs to check their operation
n E.g., assert 0.0 < result <= 1.0
¨ Write automated testing, especially unit tests
¨ Use an off-the-shelf unit testing library
6. Optimize software only after it works correctly.
¨ The correctness of the code has the highest priority in
research work
¨ Write code in the highest-level language possible
n Only use low-level programming language when they
are sure that performance boost is needed.
60. Best Practices for Scientific Computing
60
7. Document design, and purpose, not mechanics
¨ Document interface and reasons, not implementations
n Do not do that
¨ i = i + 1 # Increment the variable 'i' by one.
¨ Refactor the code instead of explaining how it works
8. Collaborate
¨ If you work in a large research project with other
members
¨ Some techniques can be applied
n Pre-merge code review
n Issue tracking tool
n Do pair programming
61. Coding practices in research
61
n As soon as you start:
¨ Create a git repo for your project.
¨ Find or build code to load your data.
¨ Find (try not to build) code to evaluate results.
¨ Find or build a very simple baseline.
n Always start with a simple and dirty working version
¨ E.g, Bag-of-word features and Naive Bayes algorithm in
text classification tasks
62. Coding practices in research
62
n Separate an experiment into small processes with
intermediate output
¨ Raw text → Preprocess → Build model → Training → Test
¨ Make sure that output of each process is as my
expectation
n In experiments, keep track of:
¨ Commands and git checkpoint identifiers for each of your
experiments.
¨ Saved model checkpoint files for all reasonably
effective/interesting experiments.
¨ Notes on what each experiment was meant to test.
63. Reproducible Research
63
n Reproducible Research is a research work that you
can reproduce the reported results with available
code (script) and data.
¨ Same data + Same script = Same results
64. Why reproducible research?
64
n It helps us to reproduce figures, statistics, etc when
we revise the work.
n Help other people who want to do research in the
field
n Make easier to compare a new method to existing
methods.
n Help to verify if the implementation is correct
n Machine Learning research community cares about
research reproducibility
¨ Reproducibility Challenge @ NeurIPS 2019 (Top Machine
Learning conference)
65. How to make your research reproducible?
65
n Don't do things by hands. Think how to do
automatically tasks, e.g.,
¨ Downloading data from website
¨ Data cleaning, preprocessing
n For things that you cannot do it automatically,
document the process well!
n Use version control
¨ Keep track of history
¨ Allows backing to old versions
n Keep track of software environment
¨ In python use virtualenv or conda environment
66. How to make your research reproducible?
66
n The Machine Learning Reproducibility Checklist:
¨ https://www.cs.mcgill.ca/~jpineau/ReproducibilityCheckli
st.pdf
n Tools:
¨ Jupyter notebook allows write documents along with
executable code
n Google Colab is a great environment (with free
GPU/TPU) to write Jupyter notebook
¨ Matplotlib or Seaborn for data visualization
¨ There are some equivalent tools for R
n knitr/R Markdown, and RStudio
67. What is Natural Language Processing
67
n A field of computer science, artificial intelligence,
and computational linguistics
n To get computers to perform useful tasks involving
human languages
¨ Human-Machine communication
¨ Improving human-human communication
n E.g Machine Translation
¨ Extracting information from texts
68. Information Extraction
68
Subject: curriculum meeting
Date: January 15, 2012
To: Dan Jurafsky
Hi Dan, we’ve now scheduled the curriculum meeting.
It will be in Gates 159 tomorrow from 10:00-11:30.
-Chris Create new Calendar entry
Event: Curriculum mtg
Date: Jan-16-2012
Start: 10:00am
End: 11:30am
Where: Gates 159
69. Information Extraction & Sentiment Analysis
69
n nice and compact to carry!
n since the camera is small and light, I won't need to carry
around those heavy, bulky professional cameras either!
n the camera feels flimsy, is plastic and very light in weight you
have to be very delicate in the handling of this camera
Size and weight
Attributes:
zoom
affordability
size and weight
flash
ease of use
✓
✗
✓
70. Doing research in NLP
70
n NLP is empirical research
¨ Relying on observations, data, experiments
n Contains many loops of experiments
71. Doing research in NLP
71
n You should review background
¨ Probabilistic and Statistics
¨ Basic math (linear algebra, calculus)
¨ Machine Learning
¨ Programming
n Read NLP books
¨ Jurafsky, D., Martin, J.H. Speech and Language Processing: an
Introduction to Natural Language Processing, Computational
Linguistics, and Speech Recognition.
¨ Manning, C.D., Schutze, H. Foundations of statistical natural
language processing.
¨ Jacob Einstein books: https://github.com/jacobeisenstein/gt-
nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
72. How to learn NLP: Get your hand dirty
72
n Practice with programming exercises:
¨ 100 NLP drill exercises:
https://github.com/minhpqn/nlp_100_drill_exercises
¨ NLP Programming Tutorial, by Graham Neubig:
http://www.phontron.com/teaching.php
73. Summary
73
n What is research?
n Why questions are important in research
n Research skill is a craft skill (learning by doing)
n Know-how in doing research
¨ Choosing a research topic
¨ Developing research ideas
¨ Literature review
¨ Coding practices in research
n Doing research in NLP field
74. Advices for your master research work
74
1. Take time to choose your master research topic
2. Work on the research problem that you are interested in
3. Start soon
4. Follow up your advisor
5. Spend time on regular literature review
6. Commit at least 2-3 hours per day for your master
research
7. Look at your data before starting doing something
8. Follow some “best” coding practices for research
9. Use version control
¨ For versioning everything that is manually created
10. Regularly backup your work on the cloud/external disks
75. References
75
n Alon, U. (2009). How to choose a good scientific problem.
Molecular cell, 35 6, 726-8.
n Wilson, G., Aruliah, D. A., Brown, C. T., Hong, N. P. C., Davis,
M., Guy, R. T., ... & Waugh, B. (2014). Best practices for
scientific computing. PLoS biology, 12(1), e1001745.
n Ali Eslami. Patterns for Research in Machine Learning:
http://arkitus.com/patterns-for-research-in-machine-learning
n Booth, W. C., Booth, W. C., Colomb, G. G., Colomb, G. G.,
Williams, J. M., & Williams, J. M. (2003). The craft of research.
University of Chicago press. Booth, W. C., Booth, W. C.,
Colomb, G. G., Colomb, G. G., Williams, J. M., & Williams, J.
M. (2003). The craft of research. University of Chicago press.