Your SlideShare is downloading. ×
0
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Karin Verspoor, NICTA: Bioinformatics and data analytics for next-generation cancer care

295

Published on

Dr. Karin Verspoor, Scientific Director, Health and Life Sciences, NICTA delivered this presentation at the 2013 Cancer Centres Symposium in Australia. The annual event explores current opportunities …

Dr. Karin Verspoor, Scientific Director, Health and Life Sciences, NICTA delivered this presentation at the 2013 Cancer Centres Symposium in Australia. The annual event explores current opportunities and challenges surrounding cancer centre policy, funding, operations, innovations and development. For more information about the annual event, please visit the conference website: http://www.informa.com.au/cancercentressymposium

Published in: Health & Medicine, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
295
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Bioinforma)cs  and   data  analy)cs  for  next-­‐ genera)on  cancer  care   Karin  Verspoor,  PhD   Principal  Researcher   Scien)fic  Director,  Health  and  Life  Sciences   NICTA   NICTA Copyright 2013 From imagination to impact
  • 2. Challenge   •  Enhancing  and  suppor)ng  biomedical  data   analysis  and  interpreta)on  will  facilitate   –  –  –  –  –  –  –  –  –  Automated  surveillance  of  pa)ents   Performance  outcomes  analysis   Improved  efficiency  in  treatment   Clinical  Decision  Support   Predic)ve  modeling  of  disease  risk   Reduc)on  of  human  effort  in  disease  research   Improved  diagnos)cs  for  disease   Accelerated  drug  target  and  lead  iden)fica)on   Personalised/precision  medicine   NICTA Copyright 2013 From imagination to impact
  • 3. Data,  Data,  Everywhere   •  Electronic  health  records   •  Radiology  images:  X-­‐ray,  MRI  and  PET  Scans   •  Radiology  and  pathology  reports   •  Data  from  sensors     •  Registry  data   •  Medicare  claim  data   •  Published  biomedical  ar)cles   •  DNA  (gene)c  material)  from  biopsy  samples   NICTA Copyright 2013 From imagination to impact 3
  • 4. Making  Sense  of  Biomedical  Data   NICTA Copyright 2013 From imagination to impact 4
  • 5. Computa)on  for  biomedical  data   NICTA Copyright 2013 From imagination to impact 5
  • 6. The  need  for  automa)on   •  Yann  LeCun,  Director  of  the  Center  for  Data  Science  at   New  York  University1:   –  “much  of  the  knowledge  in  the  world  will  soon  need  to  be   extracted  by  machines,  because  there  will  not  be  enough   brain  power  to  do  it.”   •  Russ  Altman,  Stanford  University2:   –  “Our  en)re  understanding  of  biology  and  medicine  is   really  contained  in  the  published  literature.    And  since   people  write  in  natural  language,  if  you  can’t  get   computers  to  turn  that  informa)on  into  databases  and   computable  informa)on,  you’re  falling  behind.”       1http://www.forbes.com/sites/sap/2013/11/14/the-white-house-honors-sap-stanford-and-nct/ 2http://biomedicalcomputationreview.org/content/ncbcs-take-stock-and-look-forward-fruitful-centers-face-sunset NICTA Copyright 2013 From imagination to impact 6
  • 7. “Convergence”   Bringing  together  clinicians,  biologists,  engineers,  computer   scien)sts,  mathema)cians,  sta)s)cians  and  physicists   Biomedical  Informa)cs:  Applica)on  of  knowledge  representa)on   and  computa)onal  infrastructure  for  biomedical  data  storage,   retrieval,  manipula)on,  and  analysis.   Bioinforma)cs:  Process,  analyse  and  interpret  protein  and   genomics  data.    Computa)onal  methods  and  algorithms    Robust,  scalable  computa)on    Data  mining    Predic)ve  analy)cs   NICTA Copyright 2013 From imagination to impact 7
  • 8. Machines  to  Data  to  Machines  to  Knowledge  to  Ac)on   NICTA Copyright 2013 From imagination to impact 8
  • 9. Uncovering  Hidden  Informa)on   •  About  80%  of  informa)on  is  buried  in  textual  form   –  Clinical  notes   –  Radiology  reports   –  GP  and  specialist  lecers   –  Medical  ar)cles   •  Text  Mining  Applica)ons   –  Extrac)ng  data  from  clinical       notes   –  Connec)ng  with  proteomic  and  genomic  data   –  Linking  with  biomedical  literature   NICTA Copyright 2013 From imagination to impact 9
  • 10. Prac)ce-­‐based  Evidence   •  EHRs  capture  health-­‐related  data   •  Turning  that  data  into  ac)onable  informa)on  requires   analysis  and  modeling   –  Data-­‐driven  methods   –  Integra)on  of  mul)ple  sources  of  data   –  e.g.  combining  clinical  and  gene)c  indicators  in  predic)on   of  cancer  prognosis   •  Models  produced  via  data  mining  and  predic)ve  analysis   profile  inherited  risks  and  environmental/behavioral   factors  associated  with  pa)ent  disorders   •  U)lise  to  generate  predic)ons  about  treatment   outcomes   NICTA Copyright 2013 From imagination to impact 10
  • 11. Pharmacovigilance   •  Mining  of  clinical  records  to  iden)fy  adverse  drug  events   –  Es)mated  >90%  of  adverse  events  do  not  appear  in  coded  data   –  Transform  pa)ent  records  into  pa)ent-­‐feature  matrix  encoded  using  clinical   terminologies   •  Detect  sta)s)cal  associa)ons  between  drugs  and  adverse  events   LePendu et al. (2013) “Pharmacovigilance Using Clinical Notes” Clinical Pharmacology & Therapeutics 11 NICTA Copyright 2013 From imagination to impact 93(6), 547–555; doi: 10.1038/clpt.2013.47
  • 12. Text  Mining  for  in-­‐hospital  infec)on   • Hospital-­‐acquired  infec8on  is  a  major  health  burden     – $4.5  billion  cost,  98,000  deaths  in  US  annually  [1]     – >$100  million,  1000  deaths  in  Australia  annually  for  2  common  infec)ons  [2]     • Surveillance  as  the  founda)on  of  preven)on  and  control     – shown  to  lower  infec)on  rates,  improve  detec)on,  iden)fy  overuse  of     expensive  drugs  [3]     –  ervasive  surveillance  not  feasible  without  automated  support   p • Our  approach:  mining  radiology  reports  and  images     – automate  surveillance,  leverage  hospital  informa)on  flow     – side  benefit:  early  detec)on     – Joint  project:  Alfred  Health,  Melbourne  Health,  Peter  Mac  Cancer  Ins)tute     NICTA Copyright 2013 From imagination to impact 12
  • 13. Text  mining  and  beyond   •  Current  text  mining  performance   –  94%  recall,  90%  precision  at  scan  level   –  98%  recall,  88%  precision  at  pa)ent  level   –  Effec)ve  for  surveillance;  improvement  needed  for  real-­‐ 4me  detec4on   •  Directly  classifying  CT  images  for  IFI   –  Matching  images  being  provided  by  hospital  partners   –  Set  up  as  mul4-­‐task  learning  problem:              Detect    <Image,Report>    pair  as  indica)ve  of  IFI       •  Mining  pa8ent  records  for  risk  indicators   –  Mining  historical  pa)ent  data  to  learn  impac)ng  factors   NICTA Copyright 2013 From imagination to impact 13
  • 14. Searching  for  Disease-­‐related  Genes   •  Large  amounts  of  individual   gene)c  varia)on   –  SNPs,  inser)ons,  dele)ons   –  Copy  number  varia)on,   genomic  duplica)ons,   inversions,  transloca)ons   –  DNA  methyla)on,  chroma)n   state,  histone  modifica)on,   RNA  binding  affinity,  etc.   •  Iden)fying  varia)on  is   becoming  easier,  interpre)ng   it  remains  difficult   Image credit: Jane Ades, NHGRI, http://www.sciencedaily.com/releases/2008/01/080122101914.htm NICTA Copyright 2013 From imagination to impact 14
  • 15. Singular  Nucleo)de  Polymorphisms   •  Haplotypes:   Associated  SNP  alleles   •  Chromosome  regions   where  two  groups   differ  in  haplotype   frequencies  might   contain  genes   affec)ng  the  disease   •  Analysis  enabled  by   large-­‐scale  genomic   data  collec)on,  data   storage,  and  sta)s)cal   frameworks  scaled  to   large  data  sets   NICTA Copyright 2013 From imagination to impact 15
  • 16. 1M SNPs … … … … … … … … … 1 mm … 1 mm x 10,000 samples Size  of  Epistasis  Search  Space   16 NICTA Copyright 2013 From imagination to impact
  • 17. Genome  Wide  Interac8on  Search  (GWIS)   Adam  Kowalcyzk   [Insert image or x 1,000,000 GWIS – Genome Wide Interaction Search system Our  Strength:  Integra)on  of  mathema)cal,  computa)onal,  signal  processing   and  bioinforma)cs  exper)se  resul)ng  in  unique  novel  solu)on,  Genome  Wide   Interac)on  Search  (GWIS):   ●  ●  Run  )me  improved  by  up  to  3  orders  of  magnitude  with   Improved  detec)on  rate NICTA Copyright 2013 From imagination to impact
  • 18. Timing  at  a  glance   2nd Order GWIS with Bigger Datasets The future of GWAS studies implies bigger datasets giving more precision, but longer computing times ! We are ready for these future datasets. SNPs x Samples Standard algorithm (IG*) GWIS-CPU (4 Cores Intel 3.0 GHz) GWIS-GPU (1 GTX 470) Chi-squared test GWIS-GPU on MASSIVE GPU Cluster (~ 200 Tesla C2050) GWIS-GPU on Titan (18,688 Tesla K20) 2nd Order 300K x 3K 108 days 39 minutes 3 minutes ~0 ~0 1M x 10K 11 years 25 hours 1.85 hours ~ 0.5 minutes ~0 5M x 10K 275 years 26 days 1.91 days ~ 12.24 minutes ~0 3rd Order 300K x 3K ~ 30K years ~ 30 years ~ 2.3 years ~ 5 days ~ 38 minutes 1M x 10K ~ 3.6M years ~ 3.7K years ~ 282 years ~ 612 days ~ 3.2 days 5M x 10K ~ 458.3M years ~ 453K years ~ 34.9K years ~ 208 years ~ 1.1 years 3rd Order GWIS We are developing even faster techniques, to make 3rd Order GWAS feasible (all combinations of 3 SNPs). * Fastest according to the benchmark paper: Li Chen, Guoqiang Yu, David J. Miller, Lei Song, Carl Langefeld, David Herrington, Yongmei Liu, and Yue Wang, A Ground Truth Based Comparative Study on Detecting Epistatic SNPs, Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009 November 1; 1-4(Nov 2009): NICTA Copyright 2013 From imagination to impact
  • 19. Applica)on  context:   Integrated  genomics  for  lethal  prostate  cancer   A/Prof  Chris  Hovens  at  Royal  Melbourne  Hospital  has:   •  Acquired  a  unique  bio-­‐bank  of  over  1500  prostate  cancer  samples   •  Extensive  clinical  informa)on   •  Demands  computa)onal  resources  and  exper)se  to  address  complex   genomic  analysis  problems   NICTA Copyright 2013 From imagination to impact 19
  • 20. Integrated  genomics  for  lethal  prostate  cancer   Sample  acquisi)on   Unique metastatic samples are harvested by clinical and surgical researchers during the progression of the disease NICTA Copyright 2013 From imagination to impact 20
  • 21. Integrated  genomics  for  lethal  prostate  cancer   Molecular  analysis   Samples  are  profiled  using  mul)ple   high-­‐resolu)on,  high-­‐throughput   plasorms  genera)ng  large  amounts  of   molecular  level  data   Heterogeneous   DNA  sequencing  (whole  genome)   RNA  sequencing   Methyla)on  profiling   Copy-­‐number  varia)on  profiling   =  40TB  data   NICTA Copyright 2013 (doubling  every  3  months)     From imagination to impact 21
  • 22. Algorithms  for  variant  interpreta8on   Harness  the  power  of  the  literature   •  Extract  informa)on  about   genes  and  gene)c  variants   from  biomedical  research   publica)ons   •  Start  with  the  simple   hypothesis  that  any  men)on  of   a  gene)c  variant  is  meaningful   •  Priori)ze  variants  with   literature  support   •  Provide  pointers  to  the   evidence  for  human   interpreta)on   NICTA Copyright 2013 From imagination to impact 22
  • 23. Cura)on  of  Gene)c  Variant  Informa)on     from  the  biomedical  literature   hcp://opennicta.com/home/health/variome   •  Partnership  with  InSiGHT  database  (Human  Variome  Project)   –  Collect  and  catalogue  muta)ons  in  specific  genes  implicated  in   gastrointes)nal  hereditary  tumours   –  Collected  both  by  direct  deposit  of  gene)c  variants,  and  from  cura)on   of  the  published  literature   •  We  have  developed  a  text  annota)on  schema  and  annotated   a  corpus  of  relevant  literature   –  Variant  Annota)on  Schema   –  covers  genes,  muta)ons,  diseases,  pa)ents,  body  parts,  ethnic  group,   age,  gender,  characteris)cs;  also  rela)onships  among  these   •  In  progress:  build  en)ty  and  rela)on  extrac)on  tools  to  build   tools  to  support  cura)on  of  this  informa)on   NICTA Copyright 2013 From imagination to impact
  • 24. A  “Phenotypic  code”  for  complex  disease   •  Simple  and  complex   diseases  appear  to  share  a   gene)c  architecture   •  Mining  of  co-­‐morbidi)es  of   complex  diseases  and   Mendelian  diseases  with   known  gene)c  cause   iden)fies  a  ‘code’  for  each   complex  disease  in  terms   of  Mendelian  gene)c  loci.   •  Evidence  of  epistasis   among  the  Mendelian   variants  (superlinear   Blair et al. Cell (2013); 155 (1); 70-80. http://dx.doi.org/10.1016/j.cell.2013.08.030 complex  disease  risk)   NICTA Copyright 2013 From imagination to impact 24
  • 25. BiomRKRS   Biomarker  Retrieval  and  Knowledge  Reasoning  System   •  Knowledge  management   for  biomarker  data   •  Using  ontologies/ controlled  vocabularies  as   backbone  for  integra)on   and  retrieval   •  Integra)ng  informa)on   from  a  range  of  sources,   including  the  literature   •  Support  querying   according  to  various   characteris)cs   NICTA Copyright 2013 From imagination to impact 25
  • 26. Searching  for  informa)on  via  complex  queries   NICTA Copyright 2013 From imagination to impact 26
  • 27. Predic)ve  Modeling   •  EHRs  capture  health-­‐related  data   •  Turning  that  data  into  ac)onable  informa)on  requires   analysis  and  modeling   –  Data-­‐driven  methods   –  Integra)on  of  mul)ple  sources  of  data   –  e.g.  combining  clinical  and  gene)c  indicators  in  predic)on   of  cancer  prognosis   •  Models  produced  via  data  mining  and  predic)ve  analysis   profile  inherited  risks  and  environmental/behavioral   factors  associated  with  pa)ent  disorders,  which  can  be   u)lized  to  generate  predic)ons  about  treatment   outcomes   NICTA Copyright 2013 From imagination to impact 27
  • 28. Biomedical  informa)cs  @  NICTA   NICTA Copyright 2013 From imagination to impact 28
  • 29. We  Do  Good  STUFF   NICTA Copyright 2013 From imagination to impact 29

×