Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Research Methods for Computational Statistics


Published on

Lecture notes for STIS students

  • Be the first to comment

Research Methods for Computational Statistics

  1. 1. Metodologi Penelitian Komputasi Statistik Setia Pramana
  2. 2. Educational Background  Hasselt Universiteit, Belgium, MSc in Applied Statistics 2005- 2006.  Hasselt Universiteit, Belgium, MSc in Biostatistics 2006-2007.  Hasselt Universiteit, Belgium, PhD Statistical Bioinformatics, 2007-2011.
  3. 3. Educational Background Medical Epidemiology And Biostatistics Dept. Karolinska Institutet, Sweden, Postdoctoral, 2011-2014
  4. 4. Biostatistics  The study of statistics as applied to biological areas such as Biological laboratory experiments, medical research (including clinical research), and public health services research.  Biostatistics, far from being an unrelated mathematical science, is a discipline essential to modern medicine – a pillar in its edifice’ (Journal of the American Medical Association (1966) 4
  5. 5. Bioinformatics  Bioinformatics is a science straddling the domains of biomedical, informatics, mathematics and statistics.  Applying computational techniques to biology data  Functional Genomics  Proteomics  Sequence Analysis  Phylogenetic  Etc,. 5
  6. 6. “Informatics” in Bioinformatics  Databases  Building, Querying  Object DB  •Text String Comparison  Text Search  Finding Patterns  AI / Machine Learning  Clustering  Data mining  etc 6
  7. 7. Current Research  Statistical methods for high-throughput data analyses particularly in Next generation sequencing (NGS) data (Whole genome-seq, Exome-seq and RNA-seq).  RNA microarray expression studies and GWAS in cancer and cardiovascular diseases.  Classification in NGS data.  R-Graphical User Interface (R-GUI) for high-throughput data analyses.
  8. 8. Course Outline  Basic concept Research  Problem identification and hypothesis  Literature Review  Research Design  Quantitative research  Make Scientific report/paper
  9. 9. Course Workload  40% Theory, 60% practice  Group Project (5 students)  Presentation every week  Slides can be seen at : Setia Pramana Survival Data Analysis 9
  10. 10. Research  An organized, systematic, data-based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the purpose of finding answers or solutions to it.  It provides the needed information that guides managers to make informed decisions to successfully deal with problems.  The information provided could be the result of a careful analysis of data gathered firsthand or of data that are already available (in the company, industry, archives, etc.).
  11. 11. Purpose of A Research  Review or synthesize existing knowledge  Investigate existing situations or problems  Provide solutions to problems  Explore and analyze more general issues  Construct or create new procedures or systems  Explain new phenomenon  Generate new knowledge  or a combination of any of the above!
  12. 12. Research Outcome 1. Product or Innovation directly used by Industry 2. Patent 3. International Publication
  13. 13. Types of Research, by Purpose  Basic Research  Applied Research  Evaluation Research  Research and Development
  14. 14. Types of Research, by Time  Cross-Sectional Research  Longitudinal Research
  15. 15. Types of Research, by Method  Quantitative research:  Descriptive  Correlational research  Causal-comparative  Experimental  Single-subject research  Qualitative Research:  Narrative research
  16. 16. Types of Research, by Method
  17. 17. Types of Research
  18. 18. Deductive Reasoning  Starts out with a general statement, or hypothesis, and examines the possibilities to reach a specific, logical conclusion.  The scientific method uses deduction to test hypotheses and theories.  Ex: "All men are mortal. Harold is a man. Therefore, Harold is mortal." Theory Hypothesis Observation Confirmation
  19. 19. Inductive Reasoning  The opposite of deductive reasoning.  Makes broad generalizations from specific observations.  Ex: "Harold is a grandfather. Harold is bald. Therefore, all grandfathers are bald." Theory Tentative Hypothesis Pattern Confirmation
  20. 20. Deductive/Inductive Research
  21. 21. Basic Steps 1. Develop a research question 2. Conduct thorough literature review 3. Re-define research question/ hypothesis 4. Design research methodology/study 5. Create research proposal
  22. 22. Basic Steps 6. Apply for funding 7. Apply for ethics approval 8. Collect and analyze data/Software developing and testing 9. Draw conclusions and relate findings
  23. 23. Basic Steps
  24. 24. Research Question Development
  25. 25. Research Question Development Problem Identification Limit the research scope Research Question Identification Goals Identification Hypothesis Statistical Hypothesis Hypothetical Statement
  26. 26. Building block of Science
  27. 27. Possible Source of RQs  Observational Research  Discussions, brainstorming  Experts, academics and industry  Bibliography, journals, research report, Populas science magazine, etc.
  28. 28. A Research Question Should  Have research value: Original, can be tested/evaluated.  Fisible: Can be answered, data available, cost and can be solved in time.  Match to the researchers qualification
  29. 29. FINER criteria for RQ  F Feasible Adequate number of subjects  Adequate technical expertise  Affordable in time and money  Manageable in scope  I Interesting Getting the answer intrigues investigator, peers and community  N Novel Confirms, refutes or extends previous findings Hulley S, Cummings S, Browner W, et al. Designing clinical research. 3rd ed. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.
  30. 30. FINER criteria for RQ  E Ethical Amenable to a study that institutional review board will approve  R Relevant To scientific knowledge  To clinical and health policy  To future research  Hulley S, Cummings S, Browner W, et al. Designing clinical research. 3rd ed. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.
  31. 31. Research Hypothesis
  32. 32. Hypothesis Definition
  33. 33. Research Hypothesis  The primary research question should be driven by the hypothesis rather than the data.  The research question and hypothesis should be developed before the start of the study.  A good hypothesis must be based on a good research question at the start of a study and drive data collection for the study.
  34. 34. Hypothesis  Is a clear statement of what is intended to be investigated.  It should be specified before research is conducted and openly stated in reporting the results.  It allows to Identify:  the research objectives  the key abstract concepts involved in the research  its relationship to both the problem statement and the literature review
  35. 35. Source of Hypothesis  Environment  Literature  Other Empirical Data  Personal Experience
  36. 36. Type of Hypothesis  Null Hypothesis  Alternative Hypothesis
  37. 37. Type of Hypothesis
  38. 38. Example
  39. 39. Example  There is no significant gain between pre-test and post- test scored of students exposed to Computer-Aided Instruction in Analytic Geometry
  40. 40. Special Consideration for Null Hypothesis
  41. 41. Hypothesis Testing:
  42. 42.  1-sided or 2-sided hypotheses?  A 2-sided hypothesis states that there is a difference between the experimental group and the control group, but it does not specify in advance the expected direction of the difference.  A 1-sided hypothesis states a specific direction (e.g., there is an improvement in outcomes with computer- assisted surgery).  A 2-sided hypothesis should be used unless there is a good justification for using a 1-sided hypothesis.
  43. 43. Error Type
  44. 44. Research objective  The primary objective should be coupled with the hypothesis of the study.  Study objectives define the specific aims of the study and should be clearly stated in the introduction of the research protocol.  Example:  Hypothesis : there is no difference in functional outcomes between computer-assisted acetabular component placement and free-hand placement,  The primary objective can be stated as follows: this study will compare the functional outcomes of computer-assisted acetabular component insertion versus free-hand placement in patients undergoing total hip arthroplasty.
  45. 45. Research objective  The study objective is an active statement about how the study is going to answer the specific research question.  Objectives state exactly which outcome measures are going to be used within their statements.  They are important to not only guide the development of the protocol and design of study but also play a role in sample size calculations and determining the power of the study.
  46. 46. Literature Review
  47. 47. Literature Review  Is an evaluative report of studies found in the literature related to your selected area.  Should describe, summarize, evaluate and clarify this literature.  Give a theoretical basis for the research and help you determine the nature of your own research.
  48. 48.  Select a limited number of works that are central to your area rather than trying to collect a large number of works that are not as closely connected to your topic area. Boote, D.N. & Beile, P. (2005). Scholars before researchers: On the centrality of the dissertation literature review in research preparation. Educational Researcher 34/6, 3-15.
  49. 49. Literature Review Purpose  Provide a context for the research  Justify the research  Ensure the research hasn't been done before (or that it is not just a "replication study")  Show where the research fits into the existing body of knowledge  Enable the researcher to learn from previous theory on the subject
  50. 50. Literature Review Purpose  Illustrate how the subject has been studied previously  Highlight flaws in previous research  Outline gaps in previous research  Show that the work is adding to the understanding and knowledge of the field  Help refine, refocus or even change the topic
  51. 51. Strategies
  52. 52. Strategies
  53. 53.  Kirby, S., Greaves, L. & Reid, C. (2006). Searching the Literature. In Experience research social change: Methods beyond the mainstream
  54. 54. Literature Review in a thesis
  55. 55. The cycle Hasibuan, 2007, Metode Penelitian Komputasi
  56. 56. What you should do  Compare  Contrast  Criticize  Synthesize  Summarize Hasibuan, 2007, Metode Penelitian Komputasi
  57. 57. Sources  Articles in International Journal  Thesis  Disertasi  Proceeding  Magazines  Abstract book  Websites
  58. 58. Literature Citation  Whenever you quote, summarize, paraphrase or refer to the work of another person you need to cite it.  Giving credit to the original author for any information that you learn through our research process and share with the readers.  Citing is the way to give credit to other's work when you use it in your papers, speeches and projects.  Citing other's work is a very important step in the academic writing process and the best way to avoid plagiarism.
  59. 59. Literature Citation  Two ways:  Use sentence that introduce the author  Add the author’s name at the end of the sentence  We must provide last name and year of publication  Paraphrase signal phrase: “According to Smith (2004) the cost of treating alcoholism is increasing dramatically.”  Direct Quote: “ the cost of treating alcoholism is exceeded only by the cost of treating illness from tobacco use, and is increasing exponentially” (Smith, 2004)
  60. 60. Research Design
  61. 61. Research Design  A plan or strategy for conducting the research  Spells out the basic strategies that researchers adopt to develop evidence that is accurate and interpretable.  Deals with matters such as selecting participants for the research and preparing for data collection.
  62. 62. Purposes of Research Design 1. To provide answers to research questions 2. To control variance
  63. 63. Purposes of Research Design 1. To provide answers to research questions 2. To control variance
  64. 64. Characteristics for good research design 1. Freedom from bias 2. Freedom from confusing 3. Control of extraneous variables 4. Statistical correctness for testing hypothesis
  65. 65. TYPES OF RESEARCH 1. Experimental research – involves manipulating condition and studying effects – (IPO-Input-Process-Output) 2. Correlational research – involves studying relationship s among variables within a single group, and frequently suggests the possibility of cause and effect. 3. Survey research – involves describing the characteristics of a group by means of such instruments as interview schedules, questionnaires, and tests.
  66. 66.  Ethnographic research - concentrates on documenting or portraying the everyday experiences of people using observation and interviews.  Involve how well, how much, how efficiently, knowledge, attitudes or opinion in the like exists.  Case study – is a detailed analysis of one or a few individuals  Historical research – involves studying some aspect of the past  Action research – is a type of research by practitioners designed to help improve their practice.
  67. 67. GENERAL RESEARCH TYPES It is useful to consider the various research methodologies we have described as falling within one or more general research categories – Descriptive Associational Intervention-type Studies
  68. 68. 1. DESCRIPTIVE STUDIES It describe a given state of affairs as fully and carefully as possible. Examples: - In Biology, where each variety of plant and animal species is meticulously described and information is organized into useful taxonomic categories. - In educational research, the most common descriptive methodology is the survey, as when researchers summarize the characteristics (abilities, preferences, behaviors, and so on) of individuals or groups or physical environment (school)
  69. 69. 2. ASSOCIATIONAL RESEARCH  Research that investigates relationships is often referred to as associational research  Correlational and causal-comparative methodologies are the principal examples of associational research.  Example: Studying relationship (a) between achievement and attitude (b) between childhood experiences and adult characteristics
  70. 70. (c) between teacher characteristic and student achievement (d) between methods of instruction & achievement (comparing students who have been taught by each method) (e) between gender and attitude (comparing attitudes of males and females)
  71. 71.  Descriptive research is not satisfying since most researchers want to have complete understanding of people and things not just merely describing but need further analysis.  Associational studies are, they too are ultimately unsatisfying. - because it did not permit researchers to “do something” to influence or change outcomes. - Simply determining interest or achievement of students does not tell us how to change or improve either interest or achievement.
  72. 72. 3. INTERVENTION STUDIES  To find out whether one thing will have an effect on something else, researchers need to conduct some form of intervention study.  Is a particular treatment is expected to influence one or more outcomes.  Such studies enable researchers to assess
  73. 73.  For example: - the effectiveness of various teaching methods, - curriculum models, - classroom arrangements - and other efforts to influence the characteristics of individuals or groups.  Experiment is the primary methodology used in intervention research  Some types of research may combine these 3 general types
  74. 74. Quantitative vs. qualitative research Areas Quantitative Qualitative Goals -Theory testing, establishing facts, statistical description, prediction, relationship between variables - Sensitizing concepts, describe multiple realities, grounded theory, develop understanding Design - Structured, predetermined, formal, specific detailed plan of operation - Evolving, flexible
  75. 75. Areas Quantitative Qualitative Data -Quantitative, quantifiable coding counts, measures, operationalized variables statistics - Descriptive, personal documents, field notes, photographs, people’s own words, official documents Sample - Large, stratified, control groups, precise, random, control of extraneous variables - Small, non- representative, focused, purposeful, convenient
  76. 76. Areas Quantitative Qualitative Technique or methods - Experiments, surveys, structured interviewing, structured observation - Observation, participant observation, review of documents, open-ended interviewing, first person accounts. Relationship with subjects - Detached, short term, distant, subject- researcher restricted - Empathy, emphasis on trust, democratic
  77. 77. Areas Quantitative Qualitative Data analysis - Deductive, statistical - Ongoing models, themes, concepts, inductive, analytic,constant comparative. Problems - Controlling other variables, validity, reliability - Time consuming, data reduction difficulties, procedures not standardized, difficulty to study large populations,Empathy, emphasis on trust, democratic
  78. 78. Research Types under Quantitative & Qualitative Quantitative Qualitative 1.Experimental Research 2.Single-Subject Research 3.Correlational Research 4.Causal-Comparative Research 5.Survey Research 1.Ethnographic Research 2.Historical Research
  79. 79. IDENTIFY WHAT TYPE OF RESEARCH  Historical study of college entrance requirements over time that examine the relationship between those requirements and achievement in mathematics.  An ethnographic study that describes in detail the daily activities of an inner-city high school and also finds a relationship between media attention and teacher morale in school  An investigation of the effects of different teaching methods on concept learning and gender
  80. 80. We can classify designs into a simple threefold classification by asking some key questions.
  81. 81. This threefold classification is especially useful for describing the design with respect to internal validity. A randomized experiment generally is the strongest of the three designs when your interest is in establishing a cause-effect relationship. A non-experiment is generally the weakest in this respect only to internal validity or causal assessment. In fact, the simplest form of non-experiment is a one- shot survey design that consists of nothing but a single observation O. The most common forms of research descriptive ones
  82. 82. What research type would be appropriate for these research problem? 1. How do parents feel about the elementary school counseling program? 2. Do students who have high score on reading tests also have high scores on writing tests? 3. What effect does the gender of a counselor have on how he or she is “received by counselees”? 4. How can Tom Adams be helped to learn to read?
  84. 84. Sampling Methods
  85. 85. What exactly IS a “sample”?
  86. 86. What exactly IS a “sample”? A subset of the population, selected by either “probability” or “non- probability” methods. If you have a “probability sample” you simply know the likelihood of any member of the population being included (not necessarily that it is “random.”
  87. 87. SAMPLING 9 3  A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005)  Why sample?  Resources (time, money) and workload  Gives results with known accuracy that can be calculated mathematically  The sampling frame is the list from which the potential respondents are drawn  Registrar’s office  Class rosters  Must assess sampling frame errors
  88. 88. SAMPLING……. 94  3 factors that influence sample representative- ness  Sampling procedure  Sample size  Participation (response)  When might you sample the entire population?  When your population is very small  When you have extensive resources  When you don’t expect a very high response
  89. 89. Assumptions of quantitative sampling We want to generalize to the population. Random events are predictable. Therefore…We can compare random events to our results. Probability sampling is the best approach.
  92. 92. Process 98  The sampling process comprises several stages:  Defining the population of concern  Specifying a sampling frame, a set of items or events possible to measure  Specifying a sampling method for selecting items or events from the frame  Determining the sample size  Implementing the sampling plan  Sampling and data collecting  Reviewing the sampling process
  93. 93. Assumptions of qualitative sampling Social actors are not predictable like objects. Randomized events are irrelevant to social life. Probability sampling is expensive and inefficient. Therefore… Non-probability sampling is the best approach.
  94. 94. Types of samples
  95. 95. Types of Samples 101  Probability (Random) Samples  Simple random sample  Systematic random sample  Stratified random sample  Multistage sample  Multiphase sample  Cluster sample  Non-Probability Samples  Convenience sample  Purposive sample  Quota
  96. 96. Simple Random Sample 1. Get a list or “sampling frame” a. This is the hard part! It must not systematically exclude anyone. b. Remember the famous sampling mistake? 2. Generate random numbers 3. Select one person per random number
  97. 97. SIMPLE RANDOM SAMPLING…….. 103  Estimates are easy to calculate.  Simple random sampling is always an EPS design, but not all EPS designs are simple random sampling.  Disadvantages  If sampling frame large, this method impracticable.  Minority subgroups of interest in population may not be present in sample in sufficient numbers for study.
  98. 98. Systematic Random Sample 1. Select a random number, which will be known as k 2. Get a list of people, or observe a flow of people (e.g., pedestrians on a corner) 3. Select every kth person a. Careful that there is no systematic rhythm to the flow or list of people. b. If every 4th person on the list is, say, “rich” or “senior” or some other consistent pattern, avoid this method
  99. 99. SYSTEMATIC SAMPLING…… 105  ADVANTAGES:  Sample easy to select  Suitable sampling frame can be identified easily  Sample evenly spread over entire reference population  DISADVANTAGES:  Sample may be biased if hidden periodicity in population coincides with that of selection.  Difficult to assess precision of estimate from one survey.
  100. 100. Stratified Random Sample 1. Separate your population into groups or “strata” 2. Do either a simple random sample or systematic random sample from there a. Note you must know easily what the “strata” are before attempting this b. If your sampling frame is sorted by, say, school district, then you’re able to use this method
  101. 101. STRATIFIED SAMPLING…… 107  Drawbacks to using stratified sampling.  First, sampling frame of entire population has to be prepared separately for each stratum  Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata.  Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods
  102. 102. Multi-stage Cluster Sample 1. Get a list of “clusters,” e.g., branches of a company 2. Randomly sample clusters from that list 3. Have a list of, say, 10 branches 4. Randomly sample people within those branches a. This method is complex and expensive!
  103. 103. The Convenience Sample 1. Find some people that are easy to find
  104. 104. The Snowball Sample 1. Find a few people that are relevant to your topic. 2. Ask them to refer you to more of them.
  105. 105. The Quota Sample 1. Determine what the population looks like in terms of specific qualities. 2. Create “quotas” based on those qualities. 3. Select people for each quota.
  106. 106. The Theoretical Sample
  107. 107. Jenis Penelitian untuk Skripsi Komputasi Statistik STIS  Pengembangan sistem informasi statistik Sistem informasi berbasis komputer yang dikembangkan untuk mendukung kegiatan pada domain/area statistik. Contoh: Sistem Informasi Rujukan Statistik, Sistem Informasi Geografis yang menggunakan data (hasil olahan) statistik, Sistem Informasi Diseminasi Statistik, serta Sistem Informasi Data Entri dan Monitoring dalam kegiatan pengumpulan data statistik.
  108. 108. Jenis Penelitian untuk Skripsi Komputasi Statistik STIS  Pengembangan aplikasi statistik Program aplikasi yang dibuat untuk mendukung pemecahan masalah di bidang statistika. Program harus dibuat sendiri dan pemecahan masalah tersebut belum bisa dilakukan dengan menggunakan paket program pengolahan data statistik yang sudah ada; atau program boleh dibuat dengan menggunakan pustaka/library yang sudah ada namun belum ada interface nya; atau bisa dilakukan dengan paket program namun proses/prosedurnya tidak/belum efisien sehingga perlu dibuat suatu aplikasi yang terintegrasi. Contoh: Pengembangan Aplikasi Fitting Regresi, Aplikasi Pengujian Hipotesis Menggunakan Permutation Test dalam Resampling.
  109. 109. Jenis Penelitian untuk Skripsi Komputasi Statistik STIS  Kajian teknologi di bidang komputasi statistik Kajian yang dilakukan pada dua bidang keilmuan tersebut yang hasilnya dapat bermanfaat bagi perkembangan ilmu komputer maupun statistik. Tema penelitian yang tidak masuk dalam jenis pertama dan kedua bisa dimasukkan ke dalam jenis ketiga ini jika dipandang tema penelitiannya memiliki orisinalitas dan inovasi serta tingkat kontribusi yang tinggi bagi perkembangan ilmu komputer maupun statistik, Badan Pusat Statistik, maupun bagi masyarakat. Contoh: Pengembangan Inference Engine Sistem Pakar Berbasis Database (Studi Kasus Penentuan Metode Penyusunan Indeks Harga dan Produksi), Pengembangan Mesin Pencari Statistik Berbasiskan Supervised Learning dan Relevant Feedback.
  110. 110. Metode,Teknik Dan Instrumen Dalam Penelitian  Research Instruments:  Tools for gathering data  Questioners  Interview
  111. 111. Questioners  The most common instrument or tool of research for obtaining the data beyond the physical reach of the observer which  Closed form / Closed-ended  Open form / Open-ended
  112. 112. Questioners  Clarity of language  Singleness of purpose  Relevant to the objective of the study  Correct grammar
  113. 113. Questioner: Advantages  Facilitates data gathering  Is easy to test data for reliability and validity  Is less time-consuming than interview and observation  Preserves the anonymity and confidentiality of the respondents’ reactions and answers
  114. 114. Questioner: Disadvantages  Printing and mailing are costly  Response rate maybe low  Respondents may provide only socially acceptable answers  There is less chance to clarify ambiguous answer  Respondents must be literate and with no physical handicaps  Rate of retrieval can be low because retrieval itself is difficult
  115. 115. Interview Purpose:  to verify information gathered from written sources  to clarify points of information  to update information and  to collect data
  116. 116. Interview: Types  Screening interview  Panel or Group Interview  Telephone interview
  117. 117. How to measure the instruments?  Validity- measure what is intends to measure  External validity: is the results of a study can be generalized from a sample to a population?  Content validity: The appropriateness of the content of an instrument. In other words, do the measures (questions, observation logs, etc.) accurately assess what you want to know  Reliability – stability in maintaining consistent measurement in a test administered twice  Inter-Rater/Observer Reliability: The degree to which different raters/observers give consistent answers or estimates.  Test-Retest Reliability: The consistency of a measure evaluated over time.  Parallel-Forms Reliability: The reliability of two tests constructed the same way, from the same content.  Internal Consistency Reliability: The consistency of results across items, often measured with Cronbach’s Alpha.