Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Biological & Medical
Informatics:!
the beginning
Daniel Himmelstein!
September 24, 2014
Hand Drawn Map of SF!
by Jenni Spa...
challenge
Hand Drawn Map of SF!
by Jenni Sparks
review article
The new engl and jour nal of medicine
Global Health
Measuring the Global Burden of Disease
Christopher J.L....
US Life Expectancy
Gregg Easterbrook (September 17, 2014) What
Happens When We All Live to 100?. The Atlantic
Calico:
500 ...
1950 1960 1970 1980 1990 2000 2010
0.01.02.0
Increasing R&D Spending per New Drug Approval
SpendingperDrug*
● ● ● ● ● ● ● ...
Physarum polycephalum:
Slime Mold
Plasmodium:!
• vegetative state!
• acellular!
• multinuclear!
• protoplasmic veins
(tubu...
the present
Hand Drawn Map of SF!
by Jenni Sparks
The exponential rise of ‘omics’
Andrew Su
on Twitter
‘omics’ — collective characterization and
quantification of biomolecul...
Data Scientist:
The Sexiest Job of the 21st Century
Meet the people who
can coax treasure out of
messy, unstructured data....
comparison with maternal grandmother
The Dawn of Personalized Genomics
NHGRI GWAS Catalog
Open Source Explosion
Audio from:
Let’s Talk Bitcoin!
#134 Disruptive Leaps
Andreas Antonopoulos
& Jeffrey Tucker
Science ...
the past
Hand Drawn Map of SF!
by Jenni Sparks
• Aggregate microbial rDNA
content of a seawater sample
• richness of operational
taxonomic units (OTUs)
• species distrib...
Diversity in June
Ladau et al. (2013) ISME
doi:10.1038/ismej.2013.372.05 2.20 2.35 2.50 2.65
Log10(OTU Richness)
-9
-6
-3
...
Diversity in December
Ladau et al. (2013) ISME
doi:10.1038/ismej.2013.372.05 2.20 2.35 2.50 2.65
Log10(OTU Richness)
-9
-6...
Slime Mold & the
Greater Tokyo
Rail System
Tero et al (2010) Science
DOI: 10.1126/science.1177894http://youtu.be/GwKuFREOg...
Tero et al (2010) Science
DOI: 10.1126/science.1177894
aftermath: no illumination
aftermath: geographic
constraint using i...
The SlimeNet was comparable or
preferable to the RealNet in terms of:!
• efficiency
• fault tolerance
• cost
Actual Rail Ne...
Human Evolution & Population Genetics
John Novembre
Ryan Hernandez
• 3,192 Europeans
• 500,568 SNPs
• Reduced to 2d (PCA)
...
Genes mirror geography within Europe
Novembre et al (2008) Nature
doi:10.1038/nature07331
• Despite the low diversity in E...
Medical Informatics
- An invited segment by Antoine Lizée -
How to build
intelligence around
patient medical
records
Adria...
4500 visits - 600 patients – 10th year (UCSF EPIC STUDY)
Images ~200MB/visit
Brain MRI
T1, T2, 
proton density
Processed M...
Visits
hometown
kin
college
debate
camp
Guatemala
UCSF
research
Dartmouth
The Friendship Network of Daniel Himmelstein
Learn more...
Highschool
Camp
College
Kin
UCSF
Research
Debate
1,278 nodes (1 type)
40,255 edges (1 type)
http://dhimmel.com
Facebook Fr...
Multiple
SclerosisRF1 IL2RA
4 1 1 4
Multiple
SclerosisRF1 IRF8
4 1 1 4
Multiple
SclerosisRF1 CXCR4
4 2 1 4
Multiple
Sclero...
Interactive Web Browser - http://het.io
Mechanisms of Pathogenesis
Gene—{MSigDB Collection}—Gene—Disease DWPC Model
—
—
—
———
—
—
——
— — —
—
—
—
—
———
—
—
——
— — ...
c
c
c
c
Sergio!
Baranzini
Ryan
Hernandez
John
Witte
Andrej
Sali
Katie
Pollard
Patsy
Babbitt
Decreased Lung Cancer at High Elevations
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
...
Lung Breast Colorectal Prostate
-0.6
-0.4
-0.2
0.0
0.2
0.4
2 5 8 2 5 8 2 5 8 2 5 8
Subset Size
StandardizedElevationCoecie...
the future
Hand Drawn Map of SF!
by Jenni Sparks
Subscription Publishing
Health science journal subscription
costs are skyrocketing
© Association of Research Libraries, 20...
Per Article Cost
from "Open Access: Market Size, Share, Forecast, and Trends"
Outsell. January 31, 2013
!
Subscription: $4...
Article-level metrics
doi:10.1371/journal.pone.0013636.g005
Open Access increases Citations
Gargouri et al. PLOS One. 2010...
Public Data increases Citations
citations
Piwowar & Vision (2013)
DOI: 10.7717/peerj.175
• 10,555 microarray
studies
• Cla...
Availability & Reuse
• only applies to
original research
articles
• journals often
withhold the typeset
version
• does not...
Tools for Efficiency & Reproducibility
Version control:
Online code repositories:
Interactive programming
environments:
ipy...
Personal Website
Clint Cario
clintcario.com
Daniel Himmelstein
dhimmel.com
Brian O’Donovan
iambrianodonovan.com
Kieran Mac...
your
beginning
Hand Drawn Map of SF!
by Jenni Sparks
Welcoming to incoming bioinformatics students at UCSF
Welcoming to incoming bioinformatics students at UCSF
Upcoming SlideShare
Loading in …5
×

Welcoming to incoming bioinformatics students at UCSF

713 views

Published on

A talk I gave for incoming students to the Biological & Medical Informatics program at UCSF on September 24, 2015.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Welcoming to incoming bioinformatics students at UCSF

  1. 1. Biological & Medical Informatics:! the beginning Daniel Himmelstein! September 24, 2014 Hand Drawn Map of SF! by Jenni Sparks Before the Money Came! Bettye LaVette
  2. 2. challenge Hand Drawn Map of SF! by Jenni Sparks
  3. 3. review article The new engl and jour nal of medicine Global Health Measuring the Global Burden of Disease Christopher J.L. Murray, M.D., D.Phil., and Alan D. Lopez, Ph.D. From the Institute for Health Metrics and Evaluation, University of Washington, Seattle (C.J.L.M.); and the University of Melbourne, School of Population and Global Health, Carlton, VIC, Australia (A.D.L.). Address reprint requests to Dr. Murray at the Institute for Health Metrics and Evaluation, 2301 Fifth Ave., Suite 600, Seattle, WA 98121, or at cjlm@uw.edu. N Engl J Med 2013;369:448-57. DOI: 10.1056/NEJMra1201534 Copyright © 2013 Massachusetts Medical Society. I t is difficult to deliver effective and high-quality care to patients without knowing their diagnoses; likewise, for health systems to be effective, it is necessary to understand the key challenges in efforts to improve population health and how these challenges are changing. Before the early 1990s, there was no comprehensive and internally consistent source of information on the global bur- den of diseases, injuries, and risk factors. To close this gap, the World Bank and the World Health Organization launched the Global Burden of Disease (GBD) Study in 1991.1 Although assessments of selected diseases, injuries, and risk factors in se- lected populations are published each year (e.g., the annual assessments of the human immunodeficiency virus [HIV] epidemic2), the only comprehensive assess- ments of the state of health in the world have been the various revisions of the GBD Study for 1990, 1999–2002, and 2004.1,3-10 The advantage of the GBD approach is that consistent methods are applied to critically appraise available information on each condition, make this information comparable and systematic, estimate results from countries with incomplete data, and report on the burden of disease with the use of standardized metrics. The most recent assessment of the global burden of disease is the 2010 study (GBD 2010), which provides results for 1990, 2005, and 2010. Several hundred investigators collaborated to report summary results for the world and 21 epidemio- logic regions in December 2012.11-18 Regions based on levels of adult mortality, child mortality, and geographic contiguity were defined. GBD 2010 addressed a number of major limitations of previous analyses, including the need to strength- en the statistical methods used for estimation.11 The list of causes of the disease burden was broadened to cover 291 diseases and injuries. Data on 1160 sequelae of these causes (e.g., diabetic retinopathy, diabetic neuropathy, amputations due to diabetes, and chronic kidney disease due to diabetes) have been evaluated separately. The mortality and burden attributable to 67 risk factors or clusters of risk factors were also assessed. GBD 2010, which provides critical information for guiding prevention efforts, was based on data from 187 countries for the period from 1990 through 2010. It includes a complete reassessment of the burden of disease for 1990 as well as an estimation for 2005 and 2010 based on the same definitions and methods; this facilitated meaningful comparisons of trends. The prevalence of coexisting condi- tions was also estimated according to the year, age, sex, and country. Detailed results from global and regional data have been published previously.11-18 The internal validity of the results is an important aspect of the GBD approach. For example, demographic data on all-cause mortality according to the year, coun- try, age, and sex were combined with data on cause-specific mortality to ensure that the sum of the number of deaths due to each disease and injury equaled the number of deaths from all causes. Similar internal-validity checks were used for Global Burden of Disease (2010) Disease Years Lost! (million) ischemic heart disease 129.8 HIV-AIDS 81.5 Respiratory Cancers 46.9 disability-adjusted life year (DALY) is a measure of overall disease burden, expressed as the number of years lost due to ill-health, disability or early death DOI: 10.1056/NEJMra1201534Murray et al. NEJM. 2013 ! 100 Million Pennies http://www.kokogiak.com/megapenny
  4. 4. US Life Expectancy Gregg Easterbrook (September 17, 2014) What Happens When We All Live to 100?. The Atlantic Calico: 500 Million USD
  5. 5. 1950 1960 1970 1980 1990 2000 2010 0.01.02.0 Increasing R&D Spending per New Drug Approval SpendingperDrug* ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●Td = 9.44 exponential model 1950 1960 1970 1980 1990 2000 2010 Year log10(SpendingperDrug*) −2−10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● R2 = 0.95 linear model confidence interval prediction interval *Spending in Billions of 2008 Dollars data from doi:10.1038/nrd3681 Himmelstein, Daniel; Baranzini, Sergio (2014): Increasing R&D Spending per New Drug Approval. figshare. http://dx.doi.org/10.6084/m9.figshare.937004 1950 1960 1970 1980 1990 2000 2010 0.01.02.0 Increasing R&D Spending per New Drug Approval SpendingperDrug* ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●Td = 9.44 exponential model 1950 1960 1970 1980 1990 2000 2010 Year log10(SpendingperDrug*) −2−10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● R2 = 0.95 linear model confidence interval prediction interval *Spending in Billions of 2008 Dollars data from doi:10.1038/nrd3681
  6. 6. Physarum polycephalum: Slime Mold Plasmodium:! • vegetative state! • acellular! • multinuclear! • protoplasmic veins (tubules)! • locomotion by pulsation - surface tension http://youtu.be/MX2Fo4k6pxE Signature habitat:! shady, cool, moist
  7. 7. the present Hand Drawn Map of SF! by Jenni Sparks
  8. 8. The exponential rise of ‘omics’ Andrew Su on Twitter ‘omics’ — collective characterization and quantification of biomolecules
  9. 9. Data Scientist: The Sexiest Job of the 21st Century Meet the people who can coax treasure out of messy, unstructured data. by Thomas H. Davenport and D.J. Patil hen Jonathan Goldman ar- rived for work in June 2006 at LinkedIn, the business networking site, the place still felt like a start-up. The com- pany had just under 8 million accounts, and the number was growing quickly as existing mem- bers invited their friends and col- leagues to join. But users weren’t seeking out connections with the people who were already on the site at the rate executives had expected. Something was apparently miss- ing in the social experience. As one LinkedIn manager put it, “It was Meet the people who can coax treasure out of messy, unstructured data by Thomas H. Davenport and D.J. Patil 70 Harvard Business Review October 2012 70 Harvard Business Review October 2012 Harvard Business Review October 201 Artwork: Tamar Cohen, Andrew J Buboltz, 2011 Definition (wikipedia): ! the study of the generalizable extraction of knowledge from data
  10. 10. comparison with maternal grandmother The Dawn of Personalized Genomics NHGRI GWAS Catalog
  11. 11. Open Source Explosion Audio from: Let’s Talk Bitcoin! #134 Disruptive Leaps Andreas Antonopoulos & Jeffrey Tucker Science graphic from: http://nisd.net/academics/elementary-science
  12. 12. the past Hand Drawn Map of SF! by Jenni Sparks
  13. 13. • Aggregate microbial rDNA content of a seawater sample • richness of operational taxonomic units (OTUs) • species distribution modeling Diversity of the Marine Metagenome Ladau et al. (2013) ISME doi:10.1038/ismej.2013.37 Katie Pollard -180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 180° -180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 180° -90° -60° -30° 0° 30° 60° 90° -90° -60° -30° 0° 30° 60° 90° MICROBIS FUHRMAN2008 POMMIER2007 GOS Figure S1: Sampling locations for data used in constructing maps. Models with zero to eight parameters were fitted using MICROBIS data. Predictive performance of the models was evaluated using both internal measures of model performance (AIC, BIC, and PRESS) and three independent data sets, collected at the locations shown in red, green, and yellow (see Table S1). Analyses were based on 377 samples (234 MICROBIS, 30 GOS, 9 POMMIER2007, 103 FUHRMAN2008) collected from 164 distinct locations. -180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 1 -180° -150° -120° -90° -60° -30° 0° 30° 60° 90° 120° 150° 1 -90° -60° -30° 0° 30° 60° 90° MICROBIS FUHRMAN2008 POMMIER2007 GOS Figure S1: Sampling locations for data used in constructing maps. Mode zero to eight parameters were fitted using MICROBIS data. Predictive performa the models was evaluated using both internal measures of model performance (AIC and PRESS) and three independent data sets, collected at the locations shown green, and yellow (see Table S1). Analyses were based on 377 samples (234 MICR 30 GOS, 9 POMMIER2007, 103 FUHRMAN2008) collected from 164 distinct loca
  14. 14. Diversity in June Ladau et al. (2013) ISME doi:10.1038/ismej.2013.372.05 2.20 2.35 2.50 2.65 Log10(OTU Richness) -9 -6 -3 0 30 60 90 -9 -6 -3 0 30 Log10(OTU Richness)
  15. 15. Diversity in December Ladau et al. (2013) ISME doi:10.1038/ismej.2013.372.05 2.20 2.35 2.50 2.65 Log10(OTU Richness) -9 -6 -3 0 30 60 90 -9 -6 -3 0 30
  16. 16. Slime Mold & the Greater Tokyo Rail System Tero et al (2010) Science DOI: 10.1126/science.1177894http://youtu.be/GwKuFREOgmo • 17 cm (7 in) agar- filled petri dish • plasmodium for Tokyo • quaker oats for cities • vegetate for a day • decentralized, distributed planning
  17. 17. Tero et al (2010) Science DOI: 10.1126/science.1177894 aftermath: no illumination aftermath: geographic constraint using illumination
  18. 18. The SlimeNet was comparable or preferable to the RealNet in terms of:! • efficiency • fault tolerance • cost Actual Rail Network Slime Tubule Network Tero et al (2010) Science DOI: 10.1126/science.1177894
  19. 19. Human Evolution & Population Genetics John Novembre Ryan Hernandez • 3,192 Europeans • 500,568 SNPs • Reduced to 2d (PCA) Veeramah & Hammer (2014) Nat Rev Genet doi:10.1038/nrg3625 out-of-Africa bottleneck • Europeans have less genetic diversity than Africans Novembre et al (2008) Nature doi:10.1038/nature07331
  20. 20. Genes mirror geography within Europe Novembre et al (2008) Nature doi:10.1038/nature07331 • Despite the low diversity in Europeans, 500 thousand common variants discriminate population diversity with high resolution.
  21. 21. Medical Informatics - An invited segment by Antoine Lizée - How to build intelligence around patient medical records Adriana Karembeu & Antoine Lizee at Sandler Neurosciences Center, UCSF
  22. 22. 4500 visits - 600 patients – 10th year (UCSF EPIC STUDY) Images ~200MB/visit Brain MRI T1, T2, proton density Processed MRI Cortical Thickness, Myelin Overlays CT, Myelin,
 Anatomical labels GWAS 500,000+ SNPs HLA A,B,C,
 DRB1, DQB1 Patient data Age, sex, history, etc. Clinical data Clinical Scores, treatments Patient reported Quality of Life questionnaires Processed data MRI-based ReferenceData Genotypes ~1MB/patient (Para-) Clinical Data ~250 variables/visit
  23. 23. Visits
  24. 24. hometown kin college debate camp Guatemala UCSF research Dartmouth The Friendship Network of Daniel Himmelstein Learn more online at: http://dhimmel.com • 1,278 nodes • 40,255 edges
  25. 25. Highschool Camp College Kin UCSF Research Debate 1,278 nodes (1 type) 40,255 edges (1 type) http://dhimmel.com Facebook Friends Genes DiseasesPathophysiologies Tissues Genomic Positions Perturbations Canonical Pathways BioCarta KEGG Reactome miRNA TFBS Cancer Hoods Cancer Modules GO: BP GO: MF GO: CC Oncogenic Immunologic Complex Diseases 29,241 nodes (19 types) 1,608,168 edges (20 types) http://het.io
  26. 26. Multiple SclerosisRF1 IL2RA 4 1 1 4 Multiple SclerosisRF1 IRF8 4 1 1 4 Multiple SclerosisRF1 CXCR4 4 2 1 4 Multiple SclerosisRF1 Leukocyte 2 1 1 1 paths path degree product degree weighted path count 0.707 0.25 0.25 0.177 0.677 0.707 ITCH Lung SUMO1 Multiple Sclerosis IRF1 Leukocyte Crohn’s Disease IL2RA IRF8 CXCR4 STAT3 expression interaction association localization association association association interaction Graph SubsetC PDP(path) = Y d2Dpath d w metaedge-specific degrees Network G T De l G G Di a a aG D G Da G Da MetaPaths GTDelGGDia Multiple SclerosisIRF1 IL2RA 4 1 1 4 Multiple SclerosisIRF1 IRF8 4 1 1 4 Multiple SclerosisIRF1 CXCR4 4 2 1 4 Multiple SclerosisIRF1 Leukocyte 2 1 1 1 metapath paths path degree product degree weighted path count 0.707 0.25 0.25 0.177 0.677 0.707 ITCHSUMO1 IL2RA IRF8 CXCR4 interaction a mG D P Dm i iG G G Da a lG D T Dl e eG T G Da physiology B D PDP(path) = Y d2Dpath d wm mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Da m mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Dam mG M G Da DWPC(metapath) = X path2P aths PDP(path) metaedge-specific degrees Feature Computation {Cancer Hood} {Positional} GeTeGaDGiGeTlDGeTlD {GO Function} {GO Component} {miRNA Target} {BioCarta} {Oncogenic} {TF Target} GaD (any gene) {Cancer Module} GiGiGaD {GO Process}GiGaD{KEGG} {Immunologic} {Reactome} {Perturbation} GaDmPmD GaD (any disease) GaDlTlD GaDaGaD 2 0 2 4 Standardized Coe cient Method (AUROC) ridge (0.829) lasso (0.823) Machine Learning 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Recall Partition (AUROC) Testing (0.829) Training (0.810) Performance 0.2 0.6 1.0 1.4 1.8 Meta2.5 0.0 0.2 0.4 0.6 0.8 1.0 P-value Density Combine Predictions & Statistical Evidence 15 Gene Meta2.5 HNLP WTCCC2 JAK2 0.047 0.102 0.0015 REL 0.001 0.040 0.0003 SH2B3 0.012 0.034 0.0130 RUNX3 0.016 0.025 0.0073 Table 5. Multiple sclerosis gene discovery. Discover Novel Susceptibility Genes
  27. 27. Interactive Web Browser - http://het.io
  28. 28. Mechanisms of Pathogenesis Gene—{MSigDB Collection}—Gene—Disease DWPC Model — — — ——— — — —— — — — — — — — ——— — — —— — — — — — — — — 0.4 0.6 0.8 1.0 Positional C ancerH oodBioC arta G O C om ponent m iR N A Target G O FunctionR eactom eO ncogenicTF Target KEG G G O Process C ancerM odule Im m unologic Perturbation Lasso R idge AUROC — —— — —— — —— — —— —— — — — — — — — — — — 0.4 0.6 0.8G iG aDG eTeG aD G eTlD G iG eTlDG aD aG aDG aD m Pm D G iG iG aD G aD lTlD G aD (any gene) G aD (any disease) Lasso R idge AUROC Pathophysiology degenerative immunologic metabolic neoplastic psychiatric unspeci c
  29. 29. c c
  30. 30. c c Sergio! Baranzini Ryan Hernandez John Witte Andrej Sali Katie Pollard Patsy Babbitt
  31. 31. Decreased Lung Cancer at High Elevations ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● β = −9.167 R2 = 0.202 25 50 75 100 0 1 2 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● β = −7.234 R2 = 0.252 −40 −20 0 20 40 −1 0 1 A B Elevation (km) Residual Elevation LungCancerIncidence LungCancerIncidenceResidual Bivariate Plot ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● β = −9.167 R2 = 0.202 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● β = −7.781 R2 = 0.109 25 50 75 100 120 160 0 1 2 ) ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● β = −7.234 R2 = 0.252 ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● β = −4.056 R2 = 0.04 −40 −20 0 20 40 25 −1 0 1 A B Elevation (km) Residual Elevation LungCancerIncidence LungCancerIncidenceResidual Partial Regression Plot • Counties of the American West • Lung cancer versus elevation • Publicly-available data
  32. 32. Lung Breast Colorectal Prostate -0.6 -0.4 -0.2 0.0 0.2 0.4 2 5 8 2 5 8 2 5 8 2 5 8 Subset Size StandardizedElevationCoecient 500 600 700 Model BIC Association specific to lung cancer Kamen Simeonov • Inhaled carcinogen • Oxygen concentration decreases by ~11% for every 1000 meter rise in elevation Lung Breast Colorectal Prostate -0.6 -0.4 -0.2 0.0 0.2 0.4 2 5 8 2 5 8 2 5 8 2 5 8 Subset Size StandardizedElevationCoecient 500 600 700 Model BIC Lung Breast Colorectal Prostate -0.6 -0.4 -0.2 0.0 0.2 0.4 2 5 8 2 5 8 2 5 8 2 5 8 Subset Size StandardizedElevationCoecient 500 600 700 Model BIC
  33. 33. the future Hand Drawn Map of SF! by Jenni Sparks
  34. 34. Subscription Publishing Health science journal subscription costs are skyrocketing © Association of Research Libraries, 2013 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000 $16,000 $18,000 $20,000 1.50% 1.70% 1.90% 2.10% 2.30% 2.50% 2.70% 2.90% 3.10% 3.30% 3.50% 3.70% 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Intenthousands(multiplyvaluesby100000) Library Expenditure as % of Total University Expenditure (Average of Select US ARL Libraries) Total University Expenditure (Average of Select US ARL Libraries) Library and University Expenditure Trends (Time-Series) Library and University Expenditure Trends (Time-Series) Library and University Expenditure Trends (Time-Series) Library and University Expenditure Trends (Time-Series) Library and University Expenditure Trends (Time-Series) Library and University Expenditure Trends (Time-Series) Library and University Expenditure Trends (Time-Series) Library and University Expenditure Trends (Time-Series) 1982 2011 1.7% 3.7% year %ofuniversitybudgets forlibraries Library budges are nosediving http://www.library.ucsf.edu/services/scholpub/journalcosts • Libraries are canceling subscriptions • Research is paywalled, inaccessible to those who could benefit • Scientists desire their findings to be widely-applied • Research funding is public • Very small percentage of individuals have institutional access • Academia doesn’t succeed in a vacuum — innovation grows from diverse and plentiful inputs Audio from: Let’s Talk Bitcoin! #134 Disruptive Leaps Andreas Antonopoulos
  35. 35. Per Article Cost from "Open Access: Market Size, Share, Forecast, and Trends" Outsell. January 31, 2013 ! Subscription: $4,000.00 Open Access: $950.00 UCSF Open Access Fund http://www.library.ucsf.edu/services/scholpub/oa/fund/eligibility Fully OA Journal: $2,000 Hybrid OA: $1,000 • PeerJ — Lifetime publishing plan for $99 • eLife — currently no APC, “pain free publication” • PLOS, BMC, Specialty Pubs • F1000 Research, pre-review publication • preprints, arRxiv & bioRxiv
  36. 36. Article-level metrics doi:10.1371/journal.pone.0013636.g005 Open Access increases Citations Gargouri et al. PLOS One. 2010 • Alternative to journal impact factor • Citations, downloads, views, social media • Accelerates science — impact factor = rejection • Expands the audience evaluating article importance and quality • Already used: h-index
 Grow in importance
  37. 37. Public Data increases Citations citations Piwowar & Vision (2013) DOI: 10.7717/peerj.175 • 10,555 microarray studies • Classified studies by data availability • 8 categories of covariates
  38. 38. Availability & Reuse • only applies to original research articles • journals often withhold the typeset version • does not affect reuse Creative Commons Attribution Alone Mandatory Archiving ! NIH: PubMed Central UC: eScholarship • subscription journal require the transfer of article ownership • enforce the article copyright • require licensing for reuse
  39. 39. Tools for Efficiency & Reproducibility Version control: Online code repositories: Interactive programming environments: ipython notebook
  40. 40. Personal Website Clint Cario clintcario.com Daniel Himmelstein dhimmel.com Brian O’Donovan iambrianodonovan.com Kieran Mace mace.co Andrew Sczesnak andrewsczesnak.com Kyle Barlow kylebarlow.com
  41. 41. your beginning Hand Drawn Map of SF! by Jenni Sparks

×