Hot Research Tools at Stanford


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hot Research Tools at Stanford

  1. 1. Hot Research Tools at Stanford<br />Raymond R. Balise, Ph.D.<br />Health Research and Policy/SPECTRM<br />Todd Ferris, MD<br />Stanford Center for Clinical Informatics<br />
  2. 2. Topics<br />Keeping your data secure<br />Finding patient populations<br />Tools for collecting and storing data<br />Tools for analysis<br />
  3. 3. Safety First<br />Virus Scanner<br />Disk and File Encryption<br />Secure Email<br />File Transfer<br />Backup Tools<br />
  4. 4. Getting Security Software<br />Software licensed for the entire University can be found here:<br /><br />You definitely want to have: <br />Sophos Anti-Virus<br />Stanford Desktop tools<br />Security Self-Help Tool<br />BigFix Client<br />You may want AFS & PGP<br />
  5. 5. Sophos Anti-Virus (For both Windows & Mac OS)<br />Watches for suspicious things and stops them until you authorize the software<br />If your quarantine has a file get help<br />You can submit suspicious files<br />
  6. 6. Stanford Desktop Tools<br />This allows you to install and update BigFix, Security Self-Help and Open AFS and other tools.<br />BigFixautomatically checks for important software updates.<br />Security Self-Help checks and allows you to fix security weaknesses on your machine.<br />Open AFS lets you have access to your UNIX account like it is just another Windows hard drive.<br />
  7. 7. Stanford Desktop Tools<br />
  8. 8. Your UNIX Account<br />You have a website made for you already:<br /><br />UNIX stuff<br />You can use Stanford Desktop Tools to mount your UNIX drive just like another hard drive. I get stuff on the web quickly with Open AFS<br /><br /><br />If you do not want AFS you can also use SecureFX which you can get from ESS.<br />Do NOT put confidential/HIPAA sensitive stuff out there.<br />
  9. 9. After AFS is Installed<br />
  10. 10. My UNIX Space<br />
  11. 11. SecureFX<br />
  12. 12. WebAFS<br />Only plan to use your AFS space occasionally? Or just want to be able to access your AFS space from any<br />computer?<br />Try WebAFS<br />login to:<br /><br />
  13. 13.
  14. 14. BigFix Client<br />Instead of worrying about applying all the patches you need, you can use BixFix.<br />You will not typically notice it but it will occasionally push a patch onto your machine and tell you to reboot.<br />
  15. 15. Security – Hard Drives<br />Unless it is encrypted, all the files on your computer’s hard drive are easily read.<br />Stanford has licensed PGP whole disk encryption software, and created a service called Stanford Whole Disk Encryption (SWDE)<br />Secures your entire hard drive and can encrypt USB drives.<br />If you have HIPAA sensitive information, you must secure your computer:<br /> <br />
  16. 16. Security - Email<br />Email provides all the confidentiality of a postcard. <br />If you are sending HIPAA sensitive information, you must secure your email:<br /><br />
  17. 17.
  18. 18. Back up your work!<br />Each year, on average, one in fiveof my students loses all their work. Plan on your computer being destroyed at the worst possible time this year.<br />Coffee, computer worm or virus, small child with refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc.<br />Every day, back up your work to more than one location.<br />
  19. 19. Where to Backup<br />PLEASE use removable media if you have no network access – <br />Floppy disk, CD, DVD, flash media<br />NEVER backup or share confidential data (HIPAA sensitive protected health information) on mobile media without talking to security experts first.<br />I used a Maxtor BlackArmor disk that has built in encryption (but it does not work with PGP). <br />Ask your Tech support person for recommendations. <br />
  20. 20. Encrypted USB drives<br />USB drives (also called thumb drives) are a very convenient way to keep backups and allow you to move your data around.<br />However, they are very easy to lose! NEVER store unencrypted, restricted data on a USB drive.<br />You can encrypt at the file level (excel, winZip) – Good.<br />You can encrypt the whole drive (PGP disk, TrueCypt) – Better.<br />You can have a hardware encrypted USB drive – BEST!<br />There are many manufacturers, however, most are Windows only.<br />IronKey supports both Windows and Mac and is highly recommended.<br />
  21. 21. Backup Tools<br />There are many options for backup (local drive, server, online service).<br />Properly implemented online backup services provide the most safety, by storing backup away from original.<br />Many departments use the Iron Mountain backup service.<br />Individuals can get a Stanford discount for the Mozy backup service (<br />Online services work great for < 50 GB, but when you have large amounts of data, you need to consider other options.<br />Why not just buy a 1 TB external USB drive and backup there?<br />
  22. 22. Backup Tools (cont’d)<br />External drives (more than 1) can work well if they are encrypted and you remember to physically remove them from the location of the machine being backed up.<br />Another option is to use software like CrashPlan (<br />Allows backup to another machine (ideally in a different location, but on the same network).<br />Can encrypt the data being backed up.<br />Is free for personal use.<br />NOTE: Crashplan does not have a contract with Stanford. Consequently, its online service is not already approved for the storage of restricted data.<br />
  23. 23. Building a Cohort<br />Use the STRIDE cohort tool for work before IRB.<br />You can find out if you have enough subjects to continue with your study idea.<br />There are separate tools for looking at the medical records after IRB.<br />
  24. 24.
  25. 25. Post IRB Tools<br />
  26. 26. Collecting and Storing Data<br />Excel<br />REDCap<br />Surveyor<br />
  27. 27. What can a database do?<br />Track who did what to every bit of information in the data capture system and when they did it <br />Is every change logged?<br />Can you roll back mistakes 2 days later?<br />Controls what a user can see and modify<br />Prevents you from entering garbage<br />Can I possibly enter blue for gender?<br />
  28. 28. Excel…<br />I think Excel 2007 or 2008, in theory, can do all these requirements if you have an extraordinarily talented (VBA) programmer.<br />I tried and I could not implement a satisfactory database model.<br />Anybody that is good enough to make it work will tell you to use a different tool.<br />Excel is NOT a database but it is not useless.<br />
  29. 29. Excel 2003 vs. 2007<br />Office 2007 file suffixes end with an x (.xlsx vs. .xls)<br />New graphical user interface (ribbon instead of menus)<br />Push F1 to start Excel Help then search for Interactive 2003 to find where they moved stuff.<br />Microsoft Help is no longer an oxymoron… lots of videos.<br />
  30. 30. Setting up a Spreadsheet<br />Use column headings<br />Keep names short but meaningful<br />No spaces<br />No special characters <br />~ ! @ # $ % ^ & * ( ) _ - <br />Use camelcase<br />First letter of each word is capitalized<br />Use verbs<br />
  31. 31. Include a Dummy Record<br />Include a fake first patient <br />Make the width of the character fields as wide as the widest possible value<br />African-American is 16 letters wide so use it for the fake subject’s race <br />X234567890123456 is a nice way to force the width to be 16 letters wide<br />
  32. 32. NO Missing Data<br />You want to have a value in every cell in your spreadsheets. If something is unknown, code it as “missing”, “unknown”, “refused”, “illegible”, “N/A”, etc.. <br />You want a blank cell to be a clear indicator that something is wrong. <br />
  33. 33. Make it a Table<br />If you have Excel 2007, convert the values to be a table.<br />Select the header record and the dummy record<br />
  34. 34. The context specific Table tools show up when you have clicked anywhere inside of the table.<br />Pick a color scheme<br />Give the table a name<br />
  35. 35. Data Entry Help<br />Row or column banding helps a LOT with data entry.<br />If you scroll down the table, the column headings are still displayed.<br />
  36. 36. Garbage In, Garbage Out<br />Prevent bad data from getting into your system with validation. <br />In Excel 2003, click on the column then open the Data menu and choose Validation…<br />In Excel 2007, click a cell in the dummy record, then click on the Data tab and choose Data Validation <br />
  37. 37. Custom Validation<br />By default, you can put anything in any cell.<br />Change the IDs to only allow whole numbers starting with 0.<br />Uncheck this<br />
  38. 38. Validate Everything<br />
  39. 39. Validation is Auto-filled<br />The triangles indicate a note<br />The validation is filled-in down the table as you add new records.<br />
  40. 40. Custom Errors<br />You can change and enhance the message. Click the validated cell(s) you want to modify and click Data Validation.<br />
  41. 41. Excel is Still Problematic<br />Even set up properly, Excel has significant issues. Be aware that: <br />It does not always plot data correctly. <br />In some versions, math does not work correctly on very large numbers.<br />Exports into other packages do not work cleanly and do not always generate error messages. (The result is missing data without error/warning messages.)<br />
  42. 42. Excel 2007 … Awesome …<br />
  43. 43. R<br />SAS<br />
  44. 44. You can fix this.<br />Make sure to follow these instructions carefully and/or ask for help from your IT person. If you tweak the wrong thing in the registry you can render your machine unable to reboot!<br />With XP, click the Windows Start menu and choose Run or in Vista, search for and open regedit.<br />In the dialog, type regedit and click ok.<br />Open up the tree to this path<br />HKEY_LOCAL_MACHINE ► SOFTWARE ► Microsoft ► Jet ► 4.0 ► Engines ► Excel<br />Double click TypeGuessRows.<br />Type 0, that is zero not the letter o, in the DWORD editor and click ok.<br />Repeat for this path<br />HKEY_LOCAL_MACHINE ► Software ► Microsoft ► Office ► 12.0 ► Access Connectivity Engine ► Engines ► Excel<br />Microsoft ACCESS will silently change this setting!<br />So watch this setting if you use ACCESS.<br />
  45. 45. Rather than Excel<br />Rather than doing all the hard work of setting up a validated Excel workbook, you can use a tool provided by the School of Medicine, REDCap.<br />You use Excel to make an easy template that describes the data you need to collect, then SCCI does the rest.<br />
  46. 46.<br />Click here.<br />Everyone with permission to use REDCap can see the demo database.<br />Your work will appear here until it is on the final “build”.<br />
  47. 47. Text<br />Date text<br />Notes<br />Dropdown lists<br />Radio buttons<br />
  48. 48. Explore the Excel Tutorial <br />This is the REDCap Data Dictionary Demo File.<br />It is just an Excel file that REDCap uses to build the database (inside of MySQL).<br />
  49. 49. Watch these to learn how to set it up.<br />
  50. 50. Start to Finish <br />Figure out…<br />how to break up the questionnaire into on-screen forms<br />if questions generate multiple answers<br />what to name each question<br />
  51. 51. PHI<br />These are not mutually exclusive. So you need many yes and no variables.<br />These are mutually exclusive so only one variable.<br />Other demographics and medical information<br />This is an extra variable.<br />This is an extra variable.<br />This is 3 variables.<br />
  52. 52. last<br />First<br />middle<br />dob<br />age<br />country<br />raceblack<br />raceasian<br />raceother<br />raceeast<br />racewhite<br />racedetail<br />ishispanic<br />reason<br />Sy1mptommonth<br />Symptomday<br />Symptomyear<br />reasonother<br />
  53. 53.
  54. 54. Screen shot of first build<br />
  55. 55.
  56. 56.
  57. 57.
  58. 58. Surveyor<br /><br />A great tool for collecting data into a safe location<br />
  59. 59. Surveyor<br />
  60. 60. Analysis Tools<br />R/R Commander<br />R is the preferred statistical tool for most statisticians at Stanford. Its help files are user-hostile and the learning curve is a very rough climb.<br />SAS with SAS/Enterprise Guide<br />
  61. 61. R 2.9<br />R is a modern programming language with user-hostile help files….<br />
  62. 62. Learning R<br />Finally, a great introductory book for R book. It focuses exclusively on the data manipulation and graphics instead of mixing statistics with the language.<br />The index is not great but otherwise, it is ideal.<br />
  63. 63. Slides/Notes on R<br />Notes from my five, two-hour-long introductory talks are here:<br /><br />Notes from a two-hour-long introduction to R for life sciences can be found here:<br /><br />
  64. 64. How it Works<br />R has two main websites. One describes the project: <br /><br />The other has all the stuff you could ever want to download: <br /><br />Because the project has people working all over the globe, the software download site is “mirrored” everywhere. The closest mirror is USA CA1 (aka UC Berkeley). <br />
  65. 65.<br />There is an R installer for all the common operating systems:<br /><br /><br /><br />Each is basically self explanatory.<br />
  66. 66. Tell R to Update Now<br />Update using Berkeley<br />
  67. 67. Rcmdr<br />Get the Rcmdr package. <br />Go to the Packages menu in the R Console window.<br />Click Install packages.<br />Tell it to use a mirror (Berkeley).<br />click Rcmdr then push OK. <br />It will download Rcmdr plus a lot of other packages that Rcmdr depends on.<br />You only need to do this once but you want to regularly use the Update option on the Packages menu.<br />
  68. 68. Starting Rcmdr<br />Remember: capitalization matters … In the R console, type: <br />library(Rcmdr)<br />Then push enter<br />to get a new <br />window.<br />
  69. 69. R Commander 1.4<br />Rcmdr is a friendly, but incomplete, graphical user interface (GUI) for R.<br />
  70. 70. Importing Data into R<br />You can easily import data into R with Rcmdr by using the Data menu. I will assume that you have data stored in Excel. <br />Excel is NOT a good way to enter and store data but it is what is commonly used. Do not take my use of Excel as an endorsement of the product. <br />
  71. 71. Importing Excel<br />1) Pick Excel from the Data menu<br />2)Type a short name. I suggest you use a capital letter for a dataset name.<br />3) Navigate to where the file is located on your hard drive.<br />4) Click on the table name and push OK.<br />Data from Glenn A. Walker’s Common Statistical Methods for Clinical Research with SAS Examples<br />
  72. 72.
  73. 73.
  74. 74. Adding a New Variable<br />For the first analysis, you want to compare the Body Mass Index of the subjects to a population value (28.4). The formula for BMI is:<br />Compute the BMI and save it to the dataset.<br />
  75. 75. Adding a New Value(2)<br />Type in the formula (you can double click the variable names to save on typos).<br />After hitting OK you can browse the new data set by clicking on the View data set button. <br />
  76. 76. Look at the Data<br />Never, EVER do a statistical test before you have looked at your data graphically and with a numeric summary.<br />Ask for a numeric summary of the entire dataset, not just the ones that are in the analysis. <br />Common sense applies here (summarizing 10,000 variables is not a great idea) but it is a very good idea to look at everything to see if any one variable suggests a problem.<br />
  77. 77. If you want to code….<br />summary() is a smart generic function. If you apply it to a data set, you get summaries of each variable. If you apply it to a model, you get information on each of the predictors.<br />
  78. 78.
  79. 79.
  80. 80.
  81. 81. SAS 9.2 TS2<br />SAS is an old programming language where you type commands and run a bunch of things at once.<br />
  82. 82. Enterprise Guide 4.2<br />EG is a newish programming environment where you type commands or point and click.<br />
  83. 83.
  84. 84.
  85. 85.
  86. 86.
  87. 87.
  88. 88.
  89. 89. Lots of Tools<br />If you have questions about tools for data management and analysis please ask.<br /><br /><br />