Published on

AAMC Great/Grand meeting plenary lecture, September 21, 2012, Nashville, TN

Published in: Technology, Health & Medicine
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Publishing  3.0,  or:    Why  we  will  all  be  disintermediated,     (and  that  is  a  good  thing!)   Anita  de  Waard     Disrup@ve  Technologies  Director,     Elsevier  Labs,  Burlington,  VT   (=  not  what  the  program  says  J!)   AAMC  GREAT/GRAND  Mee@ng   September  21,  2012    
  2. 2. What’s  the  big  deal  with  big  data?   Decoding  the  human  genome  involves  analysing  3   billion  base  pairs—it  took  ten  years  the  first  @me  it  was   done,  in  2003,  but  can  now  be  achieved  in  one  week.   Data,  Data  Everywhere,  The  Economist,  February  25,   Mobile  Internet  devices  will  outnumber  humans  this  year,   2010   Cisco  predicts…Global  mobile  data  traffic  is  expected  to   increase  18-­‐fold  over  the  next  five  years  to  10.8  exabytes  Facebook  stores  100  petabytes  in  Hadoop.   per  month.  Cloud  traffic  is  expected  to  account  for  71%,  or   7.6  exabytes  per  month,  of  total  mobile  data  traffic  by   2016.   ‘Big  data’  offers  huge  challenges  for  biomedicine     in  an  era  of  massive  data  sets…     Francis  Collins,  Director  of  NIH,  Yesterday  
  3. 3. Your  funders  are  telling  you     to  share  your  data:  •  NSF  Data  Sharing  Policy:   Inves8gators  are  expected  to  share  with  other  researchers,  at  no  more   than  incremental  cost  and  within  a  reasonable  @me,  the  primary  data,   samples,  physical  collec8ons  and  other  suppor8ng  materials  created  or   gathered  in  the  course  of  the  work  under  NSF  grants.    •  NIH  Data  Sharing  Policy:   Final  Research  Data  should  be  made  as  widely  and  freely  available  as   possible  while  safeguarding  the  privacy  of  par@cipants,  and  protec@ng   confiden@al  and  proprietary  data.   Final  Research  Data  means  recorded  factual  material  commonly   accepted  in  the  scien8fic  community  as  necessary  to  document  and   support  research  findings.  This  does  not  mean  summary  sta@s@cs  or   tables;  rather,  it  means  the  data  on  which  summary  sta@s@cs  and  tables   are  based.      
  4. 4. So  are  you  sharing  your  data?       Really?  
  5. 5. Crea@ng  more  data  by  the  minute.   Time:13.7min Search  (53%) Search  (48%) Age  :  35.4 Bounce  :  2%   Pols.  and  docs.(15%) Search  (35%) N=  3,561 Time:2min Pols.  A nd  docs.  (53%) Time:87.5min Age  :  20 Age  :  35.6 Pols.  and  docs.  (11%) Bounce  :  1%   Bounce  :  2.2%   Time:1.9min N=  523 Search  (15%) N=  7980 Age  :  32.2 Search  (37%) Search  (25%) Search Bounce  :  0%   Policies  &  Docs.(16%) Pols.  and  docs.  (25%) Time:1.6  m in N=  620 (36%) Age  :  22.2 Pols.  and  doc.  (44%) Time:3.9  m in Bounce  :  0.8%   Search  (26%) Age  :  27.7 Time:1.4min N=  761 Search  (28%) Bounce  :  0.7%   Age  :  11.2 Time:8.8min Pols.  and  docs.  (49%) N=  2681 Emp.  law  ref.  man.  (43%) Bounce  :  1.6%   Age  :  33.6 Emp.  law  ref.  man.  (40%) Bounce  :  1%   N=  497 Emp.  law  Ref.  Man.  (11%) N=  25,423 Employment  law.  (8%) Time:31.9min Time:2.36  m in Search  (25%) Age  :  33.5 Age  :  11.6 Pols.  and  docs.  (13%) Bounce  :  1.2%   Bounce  :   0.7%   N=  1815 N=  427 Search  (35%) Emp.  law  ref.  man.   (19%) Home  (38%) Time:2.5min Employment  law  (86%) Age  :  4.8 Bounce  :  28.4%   Employment  law  (65%) People  manager N=  5,780 Search  (19%) Home (23%) (64%) Emp.  law  ref.  man.  (24%) Time:1.14min Policies  (13%) Statutory  rates  (4%) Age  :  1 Statutory  rates  (37%) Bounce  :  0%   Time:1.6  m in N=  16 Age  :  4 Employment  law  (31%) Bounce  :  1.4%   Home  (8%) Emp.  L aw  (82%) Time:0.4min N=  141 Time:1.63min Age  :  8.6 Policies  (8%) Age  :  32.5 Bounce  :  3.6%   Bounce  :  2.6%   Emp.  law  ref.  man.  (11%) N=  8,563 N=  268 Employment  law Employment  law  (9%) (15%) Search  (35%) Time:2.4min Employment  law  (14%) Search  (48%) Emp.  law  ref.  man.  (17%) Age  :  7.3 Time:0.4min Search  (9%) Emp.  law  ref.  man.  (63%)Time:2.2  m in Bounce  :  2.1%   Age  :  8.5 N=  96Age  :  7.9 Legal  guidance  (8%) Employment  law  (11%) Time:1.8min Legal  guidance  (28%) Bounce  :  6.3%   Time:1.7minBounce  :  1.8%   Age  :  5.4 N=  10,562 Age  :  29.3 Search  (26%)N=  115,498 Search  (28%) Bounce  :   0%   Bounce  :  1%   Pols.  and  doc.(9%) Time:2.8min N=  58 Employment  law  (14%) N=  826 Age  :  40 Pols.  and  docs.  (32%) Bounce  :  0%   N=  57 Employment  law  (16%) Time:2.1  m in What’s  new  (36%) Age  :  10.2 What’s  new  (28%) Bounce  :  1.3  %   Legal  r eports  (11%) Time:1.1  m in What’s  new  (20%) N=  230 Age  :  8.9 What’s  new  (16%) Time:1.8  m in Legal  r eports  (33%) Legal  guidance  (13%) Bounce  :  1  %   Age  :  9.02 N=  98 Time:0.7min Search  (16%) Employment  law  (58%) Bounce  :  5.2%   Age  :  9.2 What’s  new N=  910 Time:0.8min Legal  guidance  (24%) Bounce  :  4.7  %   What’s  new  (17%) 1 (9%) Employment  law  (10%) N=  85 Age  :  8.8 Search  (16%) Bounce  :  3.4  %   Search  (31%) Legal  guidance  (17%) Legal  guidance  (24%) What’s  new  (13%) Time:2.5min N=  174 Time:1.7min Pols.  and  doc.(17%) Age  :  8.7 Time:1.1  m in Age  :  31.7 Time:2min Legal  r eports  (16%) Bounce  :  0.9%   Age  :  9.3 Search  (16%) Age  :  8.8 Bounce  :  1.5  %   N=  6,219 Bounce  :  0.8  %   What’s  new  (14%) N=  136 Emp.  law  ref.  man.  (13%) What’s  new  (13%) Bounce  :1%   N=  877 Legal  guidance  (11%) N=  104 5  
  6. 6. This  plant  tweets!  •  Internet  of  things:  we  can  interact  with  ‘objects   that  blog’  or  ‘Blogjects’,  that  track  where  they  are   and  where  they’ve  been;    •  have  histories  of  their  encounters  and  experiences   have  agency    •  have  a  voice  on  the  social  web  
  7. 7. Larry  Smarr  creates  lots  of  data:  •  He  wears:     •  A  Fitbit  to  count  his  every  step   •  A  Zeo  to  track  his  sleep  pajerns   •  A  Polar  WearLink  that  lets  him  regulate  his     maximum  heart  rate  during  exercise   •  23andMe  analyzed  his  DNA  for  disease  suscep@bility.  •  Your  Future  Health  analyzed  blood  and  stool  samples  for  100   biomarkers:   •  At  one  point,  C-­‐reac@ve  protein  stood  out  as  higher  than  normal.   •  A  blood  test  showed  that  his  CRP  had  climbed  to  14.5  during  the  ajack.     •  He  took  an@bio@cs,  the  symptoms  resolved,  and  his  CRP  dropped  to  4.9— but  that  was  s@ll  unusually  high.   •  Lactoferrin,  too,  rose  several  @mes  to  sky-­‐high  levels—200,  whereas  the   normal  count  is  less  than  7.3  –  and  in  tandem  with  CRP   •  Smarr  now  thinks  his  diver@culi@s  ajack  was  actually  Crohns  disease  –  and   his  gastroenterologist  (reluctantly)  agreed.  
  8. 8. As  are  lots  of  other  ‘Quan@fied  Selfers’:    Clearity  Founda@on:  A  transla@onal  medicine  and  public  service  founda@on  for:  •  Providing  doctors  access  to  molecular  profiling    for  their  ovarian  cancer  pa@ents  •  Providing  doctors  and  pa@ents  clinical  trial    op@ons  informed  by  individual  tumor  biology  •  Providing  financial  support  for  the  profiling  work    for  pa@ents  –  Oprah  approved!  
  9. 9. But  who  uses  all  that  data?    
  10. 10. does!  •  It  knows  where  you  are  •  And  who  you  talked  to  •  And  what  you  bought    •  And  how  much  you  paid..  •  And  whether  you  need  another  pair  of  shoes  •  And  when  and  where  you  can  get  them…  
  11. 11. Brijany  Wenger  does!       Winner  of  the  Google  Science  Fair  2012  17-­‐year  old  Brijany  Wenger  developed  a  cloud-­‐based  neural  network  that  is  able  to  seamlessly  and  accurately  assess  8ssue  samples  for  signs/evidence  of  breast  cancer  to  give  more  credence  to  the  currently  used  (less  reliable)  minimally  invasive  procedure  called  Fine  Needle  Aspirates  (FNAs).  By  looking  at  nine  different  input  features  and  comparing  them  to  the  training  examples,  Brijany’s  cloud-­‐based  neural  network  can  detect  malignant  breast  tumors  with  an  accuracy  of  99.11%    Because  her  neural  network  is  deployed  in  the  cloud  using  Google’s  app  engine  it  means  it  can  be  accessed  from  exis8ng  medical  systems  as  well  as  through  a  web  browser  or  mobile  apps.  
  12. 12. Mark  Wilkinson  does!   Given  a  protein  P  in  Species  X:    Find  proteins  similar  to  P  in  Species  Y      Retrieve  interactors  in  Species  Y      Sequence-­‐compare  Y-­‐interactors  with  Species  X   genome                        (1)    à  Keep  only  those  with  homologue  in        Find  proteins  similar  to  P  in  Species  Z      Retrieve  interactors  in  Species  Z      Sequence-­‐compare  Z-­‐interactors  with  (1)                            à  Puta8ve  interactors  in  Species  X     Using  what  is  known  about  interac@ons  in  fly  &  yeast,   predict  new  interac@ons  with  a  human  protein  –  Running  over  data  on  the  web  that  he  neither  created  nor  knew  about!  
  13. 13. Running  the  web  like  an  experiment:   These  are  different   Web  services!     (and  neither  of  them  Mark’s)   ...selected  at  run-­‐@me  based   on  the  same  model  
  14. 14. Puyng  it  another  way:  
  15. 15. Science  is  becoming  distributed:   Data   Tools   Thoughts  
  16. 16. Science  is  becoming  distributed:   Data   Tools   Data  is  king!   •  Data  needs  to  say  what  it’s  about   Thoughts   who  owns  it   •  Data  needs  to  say  where  it  comes  from   •  Data  needs  to  know   •  Data  needs  to  be  sensi@ve  to  privacy   •  Data  needs  to  know  how  it’s  used  
  17. 17. Science  is  becoming  distributed:   Tools  Tools  rule!     Data  Tools  can  be  made  by  everyone:  Tools  are  open  and  free  Tools  will  know  where  data  lives   Thoughts  Tools  need  to  know  about  data:  •  Privacy/ownership    •  Trustworthiness  •  Provenance  
  18. 18. Science  is  becoming  distributed:  If  data  and  tools  are  ubiquitous,  what  majers  most  are  the  ques@ons  you  ask:  •  What  is  interes@ng?    •  What  is  important?    Tools   Data  •  Who  cares?     Thoughts  
  19. 19. Science  is  becoming  more  distributed:   So  where  does  that  leave  you?  
  20. 20. How  can  you  prepare    (your  students)  for  this  future?     Well,  you  can’t  -­‐  not  really.     But  there  are  a  few  habits     you  can  ins@ll  (and  model):    
  21. 21. Habit  #  1:  Be  a  good  data  producer  •  Know  that  you  are  crea@ng  data  •  Be  aware  of  privacy  and  IPR  issues  re.  your  data  •  Assume  that  someone,  some  @me  will  be  using  this  data   for  some  purpose  you  cannot  imagine  •  Learn  which  data  repositories  exist  in  your  field,  how   they  work,  what  they  need  from  you  •  Set  up  your  work  habits  to  automa@cally  create  (or   force  you  to  add)  metadata  to  enable  discovery  and  use   of  your  data.  •  Store  your  data  in  the  repositories.  Every  @me.  
  22. 22. Habit  #2:  Be  a  good  data  consumer.    •  Find  out  which  data  exists  that  might  be   relevant  to  your  work.  •  Learn  how  to  query  available  data.  •  Be  aware  of  privacy  and  IPR  licenses.    •  Give  credit  where  it’s  due:   –  Cite  any  data  sources  that  you  use   –  Share  your  knowledge  on  querying  data   –  Deposit  any  data  you’ve  derived  from  other  data!    
  23. 23. Habit  #3:  Learn  to  code.    •  Brijany  Wenger  was  born  in  1995!    •  All  sorts  of  people  are  using  technology  that  was   invented  a{er  the  birth  of  your  oldest  grandchild.    •  Use  anything  at  your  disposal  to  learn:     –  Your  students   –  Your  kids   –  Online  forums   –  Video  tutorials,     •  Etc.  etc.    •  E.g.  Coursera  course   on  Clinical  Research     InformaKcs  -­‐  see  Cynthia  Gadd  (Vanderbilt)    
  24. 24. Habit  #  4:  Expect  to  keep  learning.    •  This  will  only  get  worse!  (Or:  bejer?)  •  Listen  to  Douglas  Engelbart:     (he  invented  the  mouse  and  the  cursor,  as  well  as  collabora@ve  work):   “[For]  improving  the  intellectual  effecKveness  of  the   individual  human  being…[o]ne  of  the  tools  that  shows   the  greatest  immediate  promise  is  the   computer…”  (1962)   “The  grand  challenge  is  to  boost  the  collecKve  IQ  of   organizaKons  and  of  society.”  (2000)    •   Expect  to  keep  learning     –  from  anyone,  and  anywhere   –  the  only  thing  that  can  limit  your  success  is  the  idea  that  you   can’t/don’t  have  to  learn/change/adapt/evolve  
  25. 25. Habit  #  5:  Don’t  find     what  you  already  know.  Richard  Feynman  on  Scien@fic  Integrity:  if  youre  doing  an  experiment,  you  should  report  everything  that  you  think  might  make  it  invalid  -­‐  not  only  what  you  think  is  right  about  it  If  you  make  a  theory,  for  example,  and  adver@se  it,  or  put  it  out,  then  you  must  also  put  down  all  the  facts  that  disagree  with  it,  as  well  as  those  that  agree  with  it.  When  you  have  put  a  lot  of  ideas  together  to  make  an  elaborate  theory,  you  want  to  make  sure,  when  explaining  what  it  fits,  that  those  things  it  fits  are  not  just  the  things  that  gave  you  the  idea  for  the  theory;  but  that  the  finished  theory  makes  something  else  come  out  right,  in  addi@on.  
  26. 26. Habit  #  6:  Anyone  can  come  up     with  a  great  idea.  •  To  paraphrase  Remi  the  Rat  (Ratatouille):     ‘Not  everyone    can  be  a  great  scienKst,  but   a  great  scienKst  can  come  from  anywhere’    •  Grand  challenges,  hackathons,  open   invita@ons  etc  etc  can  offer  great  solu@ons   to  difficult  problems  (See  Cameron  for  the   story  of  Tim  Gowers,  who  crowdsourced  math)  •  See  also  Collins’  talk  yesterday:  issues  with   race/ethnicity  need  to  be  overcome;   involve  students  from  around  the  world  •  Involve  K-­‐12  students:  get  more  kids   excited  about  science!  
  27. 27. Six  habits  that  might  help:   1.  Be  a  good  data  producer   3.  Learn  to  code  2.  Be  a  good  data  consumer   4.  Expect  to  keep  learning     Data   Tools   Thoughts   5.  Don’t  find  what  you  already  know   6.  Anyone  can  come  up  with  a  great  idea!  
  28. 28. Anyway  -­‐  how  are  we  going  to   publish  all  of  this?    
  29. 29. Not  like  this!  
  30. 30. How  are  we  going  to     publish  all  of  this?     We’re  not.     YOU  are.     (With  support  from  ‘us’    =  publishers,  libraries,  ins@tu@ons,  crowd…)  
  31. 31. Maybe  as  Executable  Papers….  
  32. 32. Or  by  linking  data  to  hospital  info  systems..     Step 1: Patient data + diagnosis link to Guideline recommendation Clinical  Guideline  Electronic Patient Records Step 2: Guideline recommendation links to evidence in report or data Data
  33. 33. Or  by  crea@ng  Linked  Data  stores...   Step  1:  Manually  iden@fy  DDIs  and  drug   names  in  wide  collec@on  of  content  sources   Step  2:  Develop  a  model  of  Drug-­‐Drug   Interac@on  and  define  candidates   Step  3:  Automate  this  process  and   store  as  Linked  Data   Images from: Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar, Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553 doi:10.1093/ bioinformatics/btq382 33
  34. 34. Or  by  gra{ing  stories  onto  your  data…     metadata   1.  Add  metadata  to  everything   metadata   metadata   2.  Use  a  workflow  tool   3.  Write  in  a  shared  space   metadata   4.  Invite  reviews   metadata   5.  The  reviewer  approves     (or  comments,  author  revises,  etc)   Rats  were  subjected  to  two   6.  Run  ni{y  apps  over  all  of  this.   grueling  tests   (click  on  fig  2  to  see  underlying   data).  These  results  suggest     that  the  neurological  pain  pro-­‐   Calculate,  coordinate…     Review   Revise   Compile,  comment,   Edit   compare…  
  35. 35. Or  by  other  ways…    •  ‘Future  of  Research  Communica@ons   and  e-­‐Science’:   –  ‘Society’  for  thinking  about   new  ways  of  communica@ng     science  and  the  humani@es   –  Invi@ng  general  par@cipa@on   –  Please  join!  
  36. 36. In  summary:    •  Big  data  and  linked  tools  are  completely  changing  the   face  of  science  by  distribu@ng  the  crea@on  of  data,   the  building  of  tools,  and  the  intelligent  use  of  both  •  Social  media  and  open  educa@on  are  changing  who   can  do  science,  and  how  it  is  done  •  Publishing  all  of  this  will  not  be  a  simple  act,  and  not   something  publishers  can  do  alone.    •  All  of  this  offers  tremendous  opportuni@es  to  expand   the  prac@ce  and  promise  of  science  •  The  best  thing  you  can  do  is  prepare  to  be  amazed…    
  37. 37. P.S.:  Do  we  have  any  jobs  for  your  graduates?  Maybe!  Some  intriguing  ideas:    •  Internships/traineeships?    •  Use  cases  for  classes  on  informa@cs,  e.g.:   –  Elsevier  provides  content/ontologies   –  Students  develop  ways  to  integrate  data  and   publica@ons   –  Students  help  user  tes@ng/UI,  model  development  •  Host  joint  grand  challenges?    •  Certainly  there  will  be  lots  of  work  in  the  informa@cs  arena   –  with  publishers,  digital  repositories,  startups,  etc,  etc…    
  38. 38. Ques@ons?