Published on

how to research,

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. How to conduct high qualityresearch and write good papers Haixun Wang Microsoft Research Asia
  2. 2. What is research?1. Solve a problem using existing methods. Write a README.txt. (low innovation, little impact)2. Improve existing solutions to an existing problem. Write a tech report. (low innovation, little impact)3. Create a new solution to an existing problem. Write a paper. (high innovation, low impact)4. Identify a new problem. Generalize the solution. Write a paper. (high innovation, high impact) 2
  3. 3. Research and Engineering• New Solutions  Useful Solutions 3
  4. 4. How innovative are you? 4
  5. 5. • It is a cruel that the children who died during the earthquake in Dujiangyan (都江堰), China, knew all too well that their country once led the world in the knowledge of the planet’s seismicity.• Why, if the Chinese had come to know so much about earthquakes so early on in their immensely long history, were they never able to minimize the effects of the world’s contortions — to at least the degree that America has?• Why did they leave the West to become leaders in the field, and leave themselves to become mired, time and again, in the kind of tragic events that we are witnessing this week? 5
  6. 6. • In almost every area of technology the Chinese were once supreme, without competition. And yet, in the 16th century China’s innovative energies inexplicably withered away, and modern science became the virtual monopoly of the West.• There had been any number of Chinese Euclids and Archimedes but there was never to be a Chinese Newton or Galileo.• Until this week Dujiangyan was a place of which China could be proud; today its wreckage stands as a tragic monument to a culture that turned its back on its remarkable and glittering history (of innovation). 6
  7. 7. How to train your innovation? 7
  8. 8. Read, Read, Read 8
  9. 9. Malcolm Gladwell Editor, New Yorker 9
  10. 10. 10
  11. 11. 10,000 hours of success Excellence requires a minimum level of practice. 10,000 hours is the magic number (3 hours per day for 10 years) 11
  12. 12. By the time Bill Gates dropped out of Harvard, he had beenprogramming nonstop for seven years, which was way past10,000 hours.years, I spent more than 3 hours watching TVIn the last 10everyday, how come I didn’t achieve anything? 12
  13. 13. Nicholas Carr, Atlantic Monthly July 2008 13
  14. 14. Independent thinking• the downfall of deep reading/thinking• Internet is rewiring our brains, forcibly adapting us to tolerate only bite-sized summations and simplified blips at the expense of deeper thought• we risk turning into ‘pancake people’—spread wide and thin as we connect with that vast network of information accessed by the mere touch of a button. 14
  15. 15. How to train your creativity? Write, Write, Write! 15
  16. 16. Research = Writing + Rewriting• Turn your idea into writing before implementing it.• Hard to write it down? Because you don’t understand the problem (or your idea). – Writing forces us to be clear, focused – Writing crystallises what we don’t understand• Writing opens the way to dialogue with others: reality check, critique, and collaboration. 16
  17. 17. Research = Writing + Rewriting• The process of writing and rewriting is the process of – developing your idea – generalizing your problem/solution• After many times of rewriting, your problem (idea) maybe totally different from the problem (idea) you start with – more interesting and challenging• It’s not a waste of time. It’s how you should spend your time when you do research. 17
  18. 18. How to find a topic?The Theory of Flying Pigs 18
  19. 19. In Reality – Pigs do not have to fly.
  20. 20. [ABSTRACT] In this paper, we identify theimportance for pigs to fly. We show thatmany challenging tasks can be modeled byflying pigs. Thus, solving the flying pigproblem benefits a large variety ofapplications. 20
  21. 21. [ABSTRACT] In this paper, we extend thepioneering work of flying pigs [1]. Ourimprovement enables pigs to fly higher.
  22. 22. [ABSTRACT] Recently, the flying pig problemhas attracted significant attention [1, 2].However, pigs in previous works are all flyingvery slow. In this paper, we introduce atechnique so that pigs can fly an order-of-magnitude faster. 22
  23. 23. and soon we have many papers …
  24. 24. What topic to work on?• The choices you make will define your career• No real problems at hand– Get a proceeding. Read from the 1st page.– Ask senior people what they are working on.– Make it go faster/higher• Find real problems, use real data 24
  25. 25. Is this topic meaningful?• Convince yourself – an issue of research ethics• Talk to your colleagues – Hey! I have a crazy idea – Convince them• Talk to/Read from people not in your field – mathematicians, physicists, biologists, … 25
  26. 26. Database research as an example• Database has been one of the most successful fields in CS in terms of applications and industrial value!• However, is there any leftover for substantial database research? – Relational database theory, a closing world? – Too many index structures already? 26
  27. 27. Example: Data Model• From : RDBMS – Normalization is one of the cornerstones of RDBMS – Theoretical results and practical applications• To: XML – Storage model: still an open problem – hybrid database, Native XML support 27
  28. 28. Example: Logic Databases• Logic database was a hot topic in the 80’s and early 90’s – models, semantics, magic sets, … – many results have since been incorporated into RDBMS – is Logic Database dead?• Rejuvenated by semantic query processing – ontology, description logics 28
  29. 29. Broadening the Scope• Concern (VLDB endowment meeting, 98’): – The area of database research may lose the pivotal role it now plays among information system technologies• Keep DB research current and relevant – We should maintain a watch on trends and future directions in the general area of information management• Can a traditionally non-DB/KDD research problem be treated using DB/KDD methods? 29
  30. 30. Writing techniques• Overcome language barrier• Paper structure and content 31
  31. 31. The Language Barrier• One must first know the rules to break them 32
  32. 32. Some General Tips• Choose the right word/phrase• Use the active voice• A picture is worth 10,000 words• Use a fair amount of formalization• The divide-and-conquer approach• Keep it simple and stupid 33
  33. 33. Choose the right word/phrase • Chicken without sexual life • Husband and wife’s lung slice • Bean curd made by a pockmarked woman 34
  34. 34. Use the active voice• Ten Yuan will be paid for every one-time towel you use. 35
  35. 35. Use the active voice The passive voice is “respectable” but it DEADENS your paper. Avoid it at all costs. “We” = you and the NO YES reader It can be seen that... We can see that... 34 tests were run We ran 34 tests These properties were We wanted to retain these thought desirable properties “We” = theIt might be thought that this You might think this would be authors would be a type error a type error “You” = the Slide borrowed from Simon Peyton Jones reader 36
  36. 36. Some General Tips• Choose the right word/phrase• Use the active voice• A picture is worth 10,000 words• Use a fair amount of formalization• The divide-and-conquer approach• Keep it simple and stupid 37
  37. 37. Be Specific NO! YES!We describe the WizWoz We give the syntax and semantics of asystem. It is really cool. language that supports concurrent processes (Section 3). Its innovative features are...We study its properties We prove that the type system is sound, and that type checking is decidable (Section 4)We have used WizWoz in We have built a GUI toolkit in WizWoz,practice and used it to implement a text editor (Section 5). The result is half the length of the Java version. From Simon Peyton Jones 38
  38. 38. Structure (conference paper)• Title (1000 readers)• Abstract (4 sentences, 100 readers)• Introduction (1 page, 100 readers)• The problem (1 page, 10 readers)• My idea (2 pages, 10 readers)• The details (5 pages, 3 readers)• Related work (1-2 pages, 10 readers)• Conclusions and further work (0.5 pages) Slide borrowed from Simon Peyton Jones 39
  39. 39. An Attractive Abstract Counts• Abstract is for people to skim through in one minute – No technical details – Plain English, easy to understand – No assumption of DB/KDD background – As short as possible• What to write – The problem, and why it is important and challenging – Your technical thrust, progress and contributions – Broader impact• Write it last! 40
  40. 40. What Is a Good Introduction• Starting from good stories – Motivation – what is the problem and why is the problem important? – 1-2 typical real-life applications• Intuition and general ideas – Intuition is most important! – No technical details – Understandable for a CS undergraduate – Use clear, small examples 41
  41. 41. What Is a Good Introduction (2)• Highlight major contributions – Typical examples: identifying a new problem, novel solutions, a systematic performance study, … – Only list the major ones, don’t over claim – Again, no technical details – A road map of the rest of the paper 42
  42. 42. What’s the difference? Hardcover: 1312 pages Publisher: Wiley; 7th edition (June 20, 2001) Language: English页码:378 页 ISBN-10: 0471381578出版日期:2004年01月 ISBN-13: 978-0471381570ISBN:7040137860 Product Dimensions: 10.1 x 9.1 x 1.9 inches条形码:9787040137866 Shipping Weight: 6.1 pounds 43
  43. 43. Writing paper is like telling a story• The goal of the title is to get the reader to read the abstract …• The goal of the abstract is to get the reader to read the introduction …• …• You need a good set up … a suspense … then you unfold your story slowly … 44
  44. 44. Goal: creating a suspense• Reader thinks “gosh, if they can really deliver this, that’d be exciting. I’d better read on” 45
  45. 45. Create SuspenseMany years later, as he faced the firingsquad, Colonel Aureliano Buendia was toremember that distant afternoon whenhis father took him to discover ice. One hundred years of solitude by Gabriel Garcí Márquez a 46
  46. 46. Keep it Simple and Stupid 一夜北风紧 红楼梦/曹雪芹 这句虽粗,不见底下的,这正是 会作诗的起法。不但好,而且留 了写不尽的多少地步与后人。 47
  47. 47. An Example (SIGMOD’02) 48
  48. 48. Motivation Found!Shifting Pattern Scaling Pattern {b,c,h,j,e} {f,d,a,g,i} 49
  49. 49. Is It Meaningful? CH1I CH1B CH1D CH2I CH2B … VPS8 401 281 120 275 298 SSA1 401 292 109 580 238 SP07 228 290 48 285 224 EFB1 318 280 37 277 215 … MDM10 538 272 266 277 236 CYS3 322 288 41 278 219 DEP1 317 272 40 273 232 … NTG1 329 296 33 274 228 … … … 50
  50. 50. Intuition Is the Most Important• Example – ensemble classifier for streams• Why ensemble? – Rigorous mathematical proof which shows ensemble reduces classification variance• Many benefits – High accuracy, ease of use, best approach in many aspects• Result: – paper rejected 51
  51. 51. Optimal decision boundary t0 t1 t t00& t1 & t2 errors! t & t 2 no errors 2 52
  52. 52. How to Present Technical Details?• The top-down approach – First give an overview of the algorithm – Present details of the major steps• The bottom-up approach – Start from the critical details – Summarize the discussion and present the algorithm• The hybrid approach – Top-down to partition the global problem – Bottom-up to present solutions to sub-problems 53
  53. 53. How to Present Examples?• Occam’s razor (the principle of parsimony) – “One should not increase, beyond what is necessary, the number of entities required to explain anything”• Find the simplest example that can show all the points you want to show – Some data in running examples can be highly skewed – Only select data that can show critical ideas 54
  54. 54. Worksheet of Running Example• Work out the complete running example• Select the interesting and critical segments• Present multiple small examples in the paper – Only one running example if possible – Preferably several paragraphs in one example – Don’t give a long, exhaustive example – Each example should focus on one point 55
  55. 55. How to Present Algorithms?• Choose the appropriate abstract level – Operations obvious – omit them • Readers have general CS background – Complicated operations – function description• The WWH sequence – Why do we need such an operation? – What is the operation? – How can the operation done efficiently? 56
  56. 56. Keep Your Algorithm Short• Long algorithms are hard to understand• Multi-level expansion of algorithms – Use functions or procedures• Ideally, each algorithm is less than 20 lines• Control the complexity – Don’t use too many variables – Use meaningful variable names – Use plain text to explain 57
  57. 57. Performance Study Goals• “Wisconsin wallpaper”• Clearly say why you design and conduct the experiments – Effectiveness measures – Efficiency measures – Other considerations 58
  58. 58. How to Present Experimental Results?• Experiment settings• Performance study goals• Selected experimental results – Explanation• Summary of performance study 59
  59. 59. How to Handle Related Work?• If possible, talk about related work at the end of the paper. – Do not interrupt the flow of your story• Extensive collection of related work – Don’t forget to look at the latest results – Go beyond your field, if possible• Give sufficient credits to others – We are standing on the shoulders of giants – Avoid emotional words – Be precise in comparison• Point out critical points – Use examples if necessary 60
  60. 60. What Should Be in Discussion?• Related issues – Constraints in your method – Drawbacks• Possible extensions – Point out the other problems that can be solved straightforwardly using the proposed method – Broader impact• Future work if you have a detailed plan 61
  61. 61. Writing Strong Conclusions• Summarize the paper briefly. – What is the problem solved – Major technical contributions – Major findings and results• Future work if possible 62
  62. 62. Aiming high! Major DB/KDD Conferences• DB (in my opinion) – 1st tier: SIGMOD, VLDB, ICDE – 2nd tier: EDBT, ICDT, CIKM, ER, SSDBM – Regional: DASFAA, WAIM, British DB Conf, Australian DB Conf, Brazilian DB Conf, DEXA, …• KDD (in my opinion) – Top: KDD – 2nd tier: SIAM DB, ICDM, – Regional: PAKDD, PKDD, … – KDD papers can be sent to DB & ML conferences 63
  63. 63. Reviewers’ Comments 64
  64. 64. Reviewers Comments• The conference review process is necessarily imperfect• The reviewers operate under strict time constraints, and the committee must make quick decisions.• Some good papers will be rejected and some embarrassing papers will be accepted. 65
  65. 65. Thank you! 66
  66. 66. My Paper Got Accepted!• Congratulations!• Address reviewers’ comments in the final version – Adopt good points – Clarify and remove confusions• Prepare a nice talk and/or poster – Pass the general idea – Use examples wherever possible – Use as few symbolic text as possible 67
  67. 67. Recycle a Paper• Before publication, a paper is likely to go through several rejections – SIGMOD,VLDB,ICDE acceptance is around 10%-15% – A conference with 25+% acceptance ratio may not be good• Aim at the next chance 68
  68. 68. Learn from the Reviews• Do we aim at the right target? – If 2/3 of reviewers are laymen of your subject, consider the forum seriously• Address technical issues – Response to reviewers’ comments by revising/enhancing technical description and experiments• Improve writing – Confused reviewers? Clarify the issues – Correct any linguistic problems pointed out 69
  69. 69. Why Journal Papers?• Records archived• Important for degree, promotion, election, … 70
  70. 70. Conference vs. Journal Papers• Length – Journal papers are often longer• Objectives – Conference papers mainly pass the ideas and results – Journal papers systematically report and justify the research, more formal 71
  71. 71. From Conference Papers to Journal Papers• A critical requirement: “major value added” – 30% in some journals, e.g., TODS, TKDE – But, how to count?• Some “major values” – More detailed/complete examples – Complete formal results and proofs – Further variations and extensions of the method – Triviality should be avoided 72
  72. 72. Steps Towards Good Research• Motivations and problems – More important than the solutions• Re-search – Systematic development of solutions• Writing a good paper – Careful design• Submissions – Good luck! 73