Your SlideShare is downloading. ×
STATA_BC_PLINK.RJLA.NOV2007.ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

STATA_BC_PLINK.RJLA.NOV2007.ppt

326
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
326
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS DATA: DATA MANAGEMENT AND ANALYSIS RICHARD ANNEY NEUROPSYCHIATRIC GENETICS RESEARCH GROUP WORKSHEET, TUTORIALS AND SLIDES AVAILABLE ON P:Personal Foldersanneyrstata9 alk http://www.medicine.tcd.ie/psychiatry/research/neuropsychiatry/
  • 2. Overview
  • 3. STATA9
    • A STATISTICAL SOFTWARE PACKAGE
    • LESS PRETTY THAN SPSS GUI
    • POWERFUL AND “SCRIPT” FRIENDLY
      • LESS CLICKING AND DROP-DOWN …MORE SCRIPTING
  • 4. STATA9: SET UP FOLDER STRUCTURE
    • SET UP FOLDERS TO STORE YOUR;
      • DO-FILES
        • CR FILE
        • AN FILE
      • DTA-FILES
      • LOG-FILES
      • INPUT-FILES (TXT)
      • OUTPUT-FILES
  • 5. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • HOW DO I GET FILES INTO STATA?
    • HOW DO I MERGE MY DATA WITH ANOTHER FILE?
    • CAN I GENERATE A FEW BASIC STATISTICS ON MY MARKERS?
    • CAN I PERFORM A CASE-CONTROL STUDY?
    • IS MY QUANTITATIVE VARIABLE ASSOCIATED WITH A GENOTYPE?
  • 6. STATA9: LOOK AT ME!! MAIN WINDOW
  • 7. STATA9: LOOK AT ME!! DO-WINDOW
  • 8. STATA9: LOOK AT ME!! MAIN WINDOW
  • 9. STATA9: LOOK AT ME!! DTA-EDITOR WINDOW
  • 10. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
  • 11. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • cr00 genotype_qtlsnp.do
    • ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
    • CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
    • MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
    • TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
  • 12. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • cr00 genotype_qtlsnp.do
    • ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
    • CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
    • MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
    • TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
  • 13. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • cr00 genotype_qtlsnp.do
    • ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
    • CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
    • MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
    • TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
  • 14. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • cr00 genotype_qtlsnp.do
    • ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
    • CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
    • MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
    • TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
  • 15. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • THE COMBINED *.DTA FILE
    • THE TABULATE FUNCTION
      • 1= ONLY IN 1 st FILE
      • 2=ONLY IN 2 nd FILE
      • 3=IN BOTH 1 st & 2 nd FILE
  • 16. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • cr00 genotype_qtlsnp.do
    • ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
    • CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
    • MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
    • TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
  • 17. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • an00 genotype_qtlsnp.do
    • CREATING THE LOG FILE USING THE LOG COMMAND
    • OPENING THE *.DTA FILE USING THE USE COMMAND
    • CREATING GENOTYPE VARIABLES FROM ALLELE VARIABLES USING GTYPE PROTOCOL
    • TABULATE THE GENOTYPE VARIABLES USING THE TABULATE COMMAND
  • 18. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • TEST HWE USING GTAB COMMAND
    • TEST HWE USING GENHW COMMAND
  • 19. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • TEST PAIR-WISE LINKAGE DISEQUILIBRIUM USING PWLD COMMAND
    • TEST ASSOCIATION WITH BINARY TRAIT USING GENCC COMMAND
  • 20. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • QTLSNP COMMAND MODELS
      • CODOMINANT (THREE MODELS)
      • DOMINANT
      • RECESSIVE
  • 21. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • TEST WHETHER A QUANTITATIVE VARIABLE IS ASSOCIATED WITH DIFFERENT INHERITENCE MODELS USING QTLSNP COMMAND - CODOMINANT
  • 22. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • TEST WHETHER A QUANTITATIVE VARIABLE IS ASSOCIATED WITH DIFFERENT INHERITENCE MODELS USING QTLSNP COMMAND – DOMINANT
    • NOT ASSOCIATED SO MINIMAL OUTPUT
  • 23. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
    • TEST WHETHER A QUANTITATIVE VARIABLE IS ASSOCIATED WITH DIFFERENT INHERITENCE MODELS USING QTLSNP COMMAND - RECESSIVE
  • 24. PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDY
  • 25. BC|SNPmax©
    • DATABASE AND ANALYSIS PLATFORM
    • MASTER DATABASE FOR STORING ALL OUR “MASTER” GENETIC AND PHENOTYPE DATASETS
    • ONGOING PROCESS TO UPLOAD AND MANAGE DATA
  • 26. BC|SNPmax: Structure
    • FIVE DOMAINS;
      • GENOTYPES/SNPS
      • MAPS
      • PEDIGREES
      • AFFECTION
      • PHENOTYPES
  • 27. BC|SNPmax: Structure
    • FIVE DOMAINS;
      • GENOTYPES/SNPS
      • MAPS
      • PEDIGREES
      • AFFECTION
      • PHENOTYPES
  • 28. BC|SNPmax: Structure
    • FIVE DOMAINS;
      • GENOTYPES/SNPS
      • MAPS
      • PEDIGREES
      • AFFECTION
      • PHENOTYPES
  • 29. BC|SNPmax: Structure
    • FIVE DOMAINS;
      • GENOTYPES/SNPS
      • MAPS
      • PEDIGREES
      • AFFECTION
      • PHENOTYPES
  • 30. BC|SNPmax: Structure
    • FIVE DOMAINS;
      • GENOTYPES/SNPS
      • MAPS
      • PEDIGREES
      • AFFECTION
      • PHENOTYPES
  • 31. BC|SNPmax: Structure
    • FIVE DOMAINS;
      • GENOTYPES/SNPS
      • MAPS
      • PEDIGREES
      • AFFECTION
      • PHENOTYPES
  • 32. FROM OUTPUT TO GEN-FILE (VIA STATA)
    • TWO EXAMPLES
      • BASIC EXCEL FILE
      • TAQ-MAN FILE
  • 33. FROM OUTPUT TO GEN-FILE (VIA STATA): BASIC EXCEL FILE
  • 34. FROM OUTPUT TO GEN PED AFF-FILE (VIA STATA): BASIC EXCEL FILE
  • 35. FROM OUTPUT TO GEN-FILE (VIA STATA): BASIC EXCEL FILE
  • 36. FROM OUTPUT TO GEN-FILE (VIA STATA): BASIC EXCEL FILE
  • 37. FROM OUTPUT TO GEN-FILE (VIA STATA): BASIC EXCEL FILE
  • 38. FROM OUTPUT TO GEN-FILE (VIA STATA): TAQ-MAN FILE
  • 39. FROM OUTPUT TO GEN-FILE (VIA STATA): TAQ-MAN FILE
  • 40. FROM OUTPUT TO GEN-FILE (VIA STATA): TAQ-MAN FILE
  • 41. BC|SNPmax
  • 42. BC|SNPmax: Types of Analysis
    • QUALITY
      • PED-CHECK
      • MERLIN
      • BASIC MEASURES (MAF, HWE, CALL)
    • FAMILY-BASED
      • MENDEL
      • MERLIN
      • GENEHUNTER
      • SIMWALK
      • FBAT/PBAT
      • TRANSMIT
      • QTDT
      • PLINK
      • HAPLOVIEW
      • R-PACKAGE
    • CASE-CONTROL
      • ALLELE ASSOCIATION
      • MENDEL
      • PHASE
      • SNPHAP
      • PLINK
      • R-PACKAGE
  • 43. BC|SNPmax: Types of Analysis
    • FOR MOST ANALYSIS YOU NEED TO SELECT MATCHED
      • GEN
      • PED
      • MAP – b128 NOW UPLOADED
      • AFF
  • 44. BC|SNPmax
  • 45. BC|SNPmax
  • 46. BC|SNPmax
  • 47. BC|SNPmax
  • 48. BC|SNPmax
  • 49. BC|SNPmax
  • 50. BC|SNPmax
  • 51. BC|SNPmax
  • 52. BC|SNPmax
  • 53. PLINK… GETTING STARTED
  • 54. PLINK…
    • RUNNING PLINK FROM YOUR OWN COMPUTER
      • WHY?
        • MULTIPLE ANALYSES
        • KEEP A RECORD OF YOUR WORK IN BAT AND SCRPT
        • EASE OF USE
        • EASE OF REPEATING TASK
        • SCRIPTS NOT DROP DOWN MENUS
        • RUNNING >1 CHROMOSOME (BC|SNPmax ADDRESSED)
        • POST-ANALYSIS INTERGRATION USING PERL AND STATA
  • 55. PLINK…
    • FOLDER STRUCTURE
      • ANALYSIS
      • DATASET
      • OUTPUT
  • 56. PLINK… DATASETS
    • PED & MAP
    • BINARY FILES
      • BINARY PED (BED)
      • BINARY MAP (BIM)
      • FAMILY FILES (FAM)
  • 57. PLINK…
    • PED & MAP
    • BINARY FILES
      • BINARY PED (BED)
      • BINARY MAP (BIM)
      • FAMILY FILES (FAM)
  • 58. PLINK…
    • PED & MAP
    • BINARY FILES
      • BINARY PED ( BED )
      • BINARY MAP (BIM)
      • FAMILY FILES (FAM)
  • 59. PLINK…
    • PED & MAP
    • BINARY FILES
      • BINARY PED (BED)
      • BINARY MAP ( BIM )
      • FAMILY FILES (FAM)
  • 60. PLINK…
    • PED & MAP
    • BINARY FILES
      • BINARY PED (BED)
      • BINARY MAP (BIM)
      • FAMILY FILES ( FAM )
  • 61. EXAMPLE ANALYSES IN PLINK…
    • DATA TRANSFORMATION
    • DATA FILTERING AND PRUNING
    • DATA MERGING
    • SUMMARY STATS
      • MISSINGNESS
      • HWE
      • MAF
      • MENDEL ERRORS
    • INCLUSION THRESHOLDS
    • POPULATION STRATIFICATION
    • ASSOCIATION
      • CASE/CONTROL
      • QTL
      • GxE
    • NEW MULTIPLE CORRECTION TESTING (--adjust)
    • FAMILY-BASED
      • TDT
      • POO
    • PERMUTATION
    • EPISTASIS
    • HAPLOTYPE ANALYSIS
    • NEW PROXY-ASSOCIATION (FROM SNP TO HAPLOTYPE)
    • R-PACKAGE
    • NEW MODIFY OUTPUT
      • PLOG10
      • P<x
      • GENOMIC CONTROL
      • QQ-PLOT
  • 62. PLINK… : RUNNING TDT IN PLINK
    • CAN RUN FROM COMMAND LINE AND USING gPLINK (GUI)
    • RECOMMEND BAT AND SCRPT FILES
  • 63. PLINK… : SUMMARY TABLES IN STATA
    • INSHEET THE TDT.CLEAN FILE
      • ADD GENE NAMES
      • ADD CHROMOSOME POSITION
      • ADJUST OR TO RISK
      • GENERATE GRAPHS OF DATA
      • GENERATE TABLES BY GENE
      • GENERATE TABLES BY POSITION
      • GENERATE TABLES BY P-VALUE
      • SELECT COLUMNS FOR OTHER ANALYSES (GENMAPP)
  • 64. THE END!