Your SlideShare is downloading. ×
0
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
5. Dan MacLean- Sainsbury Laboratory
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

5. Dan MacLean- Sainsbury Laboratory

745

Published on

A presentation about: "Squeezing big data into a small organisation".

A presentation about: "Squeezing big data into a small organisation".

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
745
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Squeezing  big  data  into  a   small  organisation   Dan  MacLean     The  Sainsbury  Laboratory  
  • 2. The  Sainsbury  Lab  
  • 3. TSL  Funding   Source   Gatsby  Core   Other  (BBSRC,   EU  etc)  
  • 4. TSL  Research   “The  Sainsbury  Laboratory  is  dedicated  to  making  fundamental  discoveries  about   plants  and  how  they  interact  with  microbes  and  viruses  and  favours  daring,  long-­‐ term  research  over  work  that  could  be  equally  well  carried  out  elsewhere”.  Basic  and  translational  research  into  Plant/Pathogen  interactions  
  • 5. genomics   HTGS   effector  finding   R  gene  cloning   assembly  de  novo   resequencing   SNP  detection   annotation  pipelines  transcriptomics   RNA  seq   ChIP  seq   arrays  proteomics   high  throughput  image  analysis   pipeline  development   image  segmentation   statistical  methods  for   object  detection   spectrum  analysis   algorithm  development    
  • 6. TSL  Tech  Illumina  GA  II   Opera  HCA  Imaging   Orbitrap  LC-­‐MS   5  groups,  ~  80  scientists  =>  2  bioinformaticians       understand,  manage,  analyze  
  • 7. understand,  manage,  analyze  120  100   80   60   core  informatician   project  scientist   40   20   0   understanding   analysis   management   (biological  provenance)  
  • 8. understand,  manage,  analyze  120  100   80   60   core  informatician   project  scientist   40   20   0   understanding   analysis   management   (biological  provenance)  
  • 9. Where  to  focus?   ?  
  • 10. Where  to  focus?   (“bioinformatics  is  easy”  –  do  your  own  &%*@@!  BLAST)   Bioinformatics  is  a  sub-­‐discipline  of  molecular  biology*  *Not  true  everywhere,  but  for  our  purposes  true  enough  
  • 11. P.infestans    genomics   P.infestans   Support  Outline    transcriptomics   analysis  deficit  Albugo    transcriptomics  Hpa    genomics   A.thaliana    proteomics     A.thal  HT      Microscopy  Albugo      genomics  A.thaliana    genomics   management  deficit   understanding  deficit  
  • 12. Training   Workshops:    Perl,  Ruby,      CMD  Line,      MySQL,    R  +  Stats,    Excel  VB    Browsers  and  Desktop  tools…       Resource  provision:    Workshop  notes,      Workshop  Podcasts,      Quick  ‘How  do  I’  podcasts    Library         Integration  of  best  practice:    Lab  meetings,      Journal  Clubs,        An  open  dialogue  –  most  important  is  to  have  someone  approachable,  who  wants  to  do  it  
  • 13. Systems  Development     (SOP  for  common  tasks  –  keeping  the  house  in  order)  Shared  Data  Common  data  storage  –  results  and  raw  data    Don’t  inconvenience    Make  access  easy  Make  messing  it  up  hard  Need  to  get  PLs  on-­‐side…  
  • 14. Systems  Development    Shared  Data   Local  validation  rules  for  features  and  sequence   •  Annotation  (GFF3)  consistent  with  local  specs   •  Dbxref,  IDs,  sl_id,  species_id,         Sequence  and  feature  databases   •   version  and  update  tracking  –  central  store    
  • 15. Systems  Development     galaxy   Workflows  –  great  for  sharing  ‘vanilla’  analysis  protocols   Customisable  –  great  for  running  in-­‐house  scripts  
  • 16. Supplementary  Research   Reference-­‐free  SNP  detection  De  Bruijn  graph  representations  of  polymorphisms  in  sequence  reads  New  method  –  reduces  steps  in  SNP  finding,  makes  possible  where  no  reference  available   Collaboration  with  Mario   Caccamo  at  TGAC  
  • 17. Advantage   Time  •  Get  analysis  know-­‐how  into  heads  with  biology  •  Reduce  workload  •  Improve  reproducibility  •  These  things  propagate…  
  • 18. Summary  •  There  is  a  lot  a  ‘biologist’  can  do  themselves  •  Start  a  dialogue  •  Get  tough,  get  your  house  in  order  •  Lower  the  barriers  to  access  and  capability  
  • 19. Acknowledgements      •  Dr  Graham  Etherington  •  Michael  Burrell  •  Sophien  Kamoun  •  Jonathan  Jones  •  Silke  Robatzek  •  Eric  Ward  •  Richard  Leggett  •  Mario  Caccamo  

×