Millburn - Flybase community curation

324 views
290 views

Published on

Directly e-mailing authors of newly published papers encourages community curation, by Stephanie Bunt, Gary Grumbling, Helen Field, Steven Marygold, Thom Kaufman, Kathy Matthews, Nick Brown and Gillian Millburn.

Presented at the 5th International Biocuration Conference, hosted by PIR in Washington, DC, April 2-4, 2012.

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
324
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Millburn - Flybase community curation

  1. 1. Directly  e-­‐mailing  authors  of  newly   published  papers  encourages   community  cura8on   Stephanie  Bunt,  Gary  Grumbling,  Helen  Field,  Steven  Marygold,   Thom  Kaufman,  Kathy  MaChews,  Nick  Brown  and  Gillian  Millburn  
  2. 2. Overview   •  Background  –  why  choose  triaging  ?   •  Community  cura8on  pipeline   •  Results  –  how  successful  were  we  ?   •  Future  plans  
  3. 3. •  Background  –  why  choose  triaging  ?  •  Community  cura8on  pipeline  •  Results  –  how  successful  were  we  ?  •  Future  plans  
  4. 4. Background:  why  choose  triaging  of  papers  ?   Weekly literature search! (semi-automated)! Skim curation" Flag data-types in paper" Record main genes studied"Use  flags  to  priori8se   Full curation!
  5. 5. Background:  why  choose  triaging  of  papers  ?   Weekly literature search! (semi-automated)! Examples  of  data-­‐type  flags:   •   new  allele   •   new  transgenic  construct   •   phenotype   •   newly  characterised  gene   Skim curation"   Flag data-types in paper" Record main genes studied" •   expression  data   •   gene  model  data  Use  flags  to  priori8se   •   physical  interac8on  data   Full curation!
  6. 6. Background:  why  choose  triaging  of  papers  ?   Weekly literature search! (semi-automated)! •   skimming  takes  a  significant  amount  of   curator  effort     •   simple   Skim curation" Flag data-types in paper" Record main genes studied"Use  flags  to  priori8se   Full curation!
  7. 7. •  Background  –  why  choose  triaging  ?  •  Community  cura8on  pipeline  •  Results  –  how  successful  were  we  ?  •  Future  plans  
  8. 8. Pipeline:  the  community  cura8on  tool  
  9. 9. Pipeline:  the  community  cura8on  tool  
  10. 10. Pipeline:  the  community  cura8on  tool  
  11. 11. Pipeline:  the  community  cura8on  tool  
  12. 12. Pipeline:  integra8ng  community  cura8on   Weekly literature search! (semi-automated)! Skim curation" Flag data-types in paper" Community curation tool" Record main genes studied" Community skim curation"Use  flags  to  priori8se   Full curation!
  13. 13. Pipeline:  integra8ng  community  cura8on   Weekly literature search! (semi-automated)! Community curation tool" Community skim curation"Use  flags  to  priori8se   Full curation!
  14. 14. Pipeline:  integra8ng  community  cura8on   Weekly literature search! (semi-automated)! Community curation tool" Community skim curation"Use  flags  to  priori8se   Full curation!
  15. 15. Pipeline:  integra8ng  community  cura8on   Weekly literature search! (semi-automated)! Download PDF files! (semi-automated)! E-mail authors (automated)! Community curation tool" Community skim curation"Use  flags  to  priori8se   Full curation!
  16. 16. Pipeline:  integra8ng  community  cura8on   Weekly literature search! (semi-automated)! Download PDF files! •   E-­‐mail  contains  personalised  hyperlink   (semi-automated)! •   Takes  author  to  part  filled-­‐in  tool   E-mail authors (automated)! Community curation tool" Community skim curation"Use  flags  to  priori8se   Full curation!
  17. 17. •  Background  –  why  choose  triaging  ?  •  Community  cura8on  pipeline  •  Results  –  how  successful  were  we  ?  •  Future  plans  
  18. 18. Results:  response  rate   First  year’s  results  (Oct  2010  –  Oct  2011):     •   1857  e-­‐mails  sent   •   815  completed  responses   •   =  44%  response  rate   •   ~  68/month  =  7.5x  rate  prior  to  e-­‐mailing  
  19. 19. Results:  does  the  age  of  the  paper  maCer  ?  Weekly  e-­‐mailing   Author  skim  cura8on  (paper  in  PubMed  for  <2   44%  weeks)   No  response  
  20. 20. Results:  does  the  age  of  the  paper  maCer  ?  Weekly  e-­‐mailing   Author  skim  cura8on  (paper  in  PubMed  for  <2   44%  weeks)   No  response  One  off-­‐emailing  Dec  2010  (paper  in  PubMed  for  2-­‐13  months)  
  21. 21. Results:  does  the  age  of  the  paper  maCer  ?  Weekly  e-­‐mailing   Author  skim  cura8on  (paper  in  PubMed  for  <2   44%  weeks)   No  response  One  off-­‐emailing  Dec  2010   36%  (paper  in  PubMed  for  2-­‐13  months)  
  22. 22. Results:  has  e-­‐mailing  increased  volunteer  submissions  ?    Before  e-­‐mailing   •   ~  9  submissions/month  
  23. 23. Results:  has  e-­‐mailing  increased  volunteer  submissions  ?    Before  e-­‐mailing   •   ~  9  submissions/month      Since  started  e-­‐mailing   •   ~  8  submissions/month  
  24. 24. Results:  targe8ng  authors  to  a  specific  paper  helps   (!" !" &!" ./,012"3456"+/27819" !""#$%&()$Tool  usage   %!" :7;<2"7=2<7>?"+/27,<>" $!" #!" !" ##)*+,)#!" #$)*+,)#!" #%)*+,)#!" #&)*+,)#!" #)*+,)#!" #()*+,)#!" #-)*+,)#!" *+)$ General  e-­‐mail  sent  
  25. 25. Results:  accuracy   Analysed  1134  author  skim-­‐curated  papers  that   have  subsequently  been  fully  curated     Gene  data     •   only  had  to  remove  gene(s)  from  4.8%  of  papers          
  26. 26. Results:  accuracy  of  author-­‐curated  flags   ()*"+,,),)"-."+/)..+0-1" ()*"2.+134)1)" 5160+,"78+.+72).69+0-1" :).4)"-;"4)1)".)<-.23" =)1)".)1+>)" ?@<.)336-1"61"+"*6,AB2C<)"/+7D4.-E1A" §   Correct   G-..)72" ?@<.)336-1"61"+">E2+12"/+7D4.-E1A" F8)1-2C<67"+1+,C363" §   False  posi8ve   K+,3)"<-360L)" K+,3)"1)4+0L)" F8C367+,"612).+70-1" §   False  nega8ve   (-2"<.)3)12" G8+14)3"2-"HI">),+1-4+32)."4)1)">-A)," §   Not  present   G8+14)3"2-"1-1BHI">),+1-4+32)."4)1)">-A)," :+<<614"-;";)+2E.)3"2-"4)1->)" G63B.)4E,+2-.C"),)>)123"A)J1)A" !" #!!" $!!" %!!" &!!" !!!" #!!" !"#$%&()*+*%&, Number  of  papers  
  27. 27. Results:  over-­‐flagging   ()*"+,,),)"-."+/)..+0-1" ()*"2.+134)1)" 5160+,"78+.+72).69+0-1" :).4)"-;"4)1)".)<-.23" =)1)".)1+>)" ?@<.)336-1"61"+"*6,AB2C<)"/+7D4.-E1A" §   Correct   G-..)72" ?@<.)336-1"61"+">E2+12"/+7D4.-E1A" F8)1-2C<67"+1+,C363" §   False  posi8ve   K+,3)"<-360L)" K+,3)"1)4+0L)" F8C367+,"612).+70-1" §   False  nega8ve   (-2"<.)3)12" G8+14)3"2-"HI">),+1-4+32)."4)1)">-A)," §   Not  present   G8+14)3"2-"1-1BHI">),+1-4+32)."4)1)">-A)," :+<<614"-;";)+2E.)3"2-"4)1->)" G63B.)4E,+2-.C"),)>)123"A)J1)A" !" #!!" $!!" %!!" &!!" !!!" #!!" !"#$%&()*+*%&, Number  of  papers  
  28. 28. Results:  over-­‐flagging   ()*"+,,),)"-."+/)..+0-1" ()*"2.+134)1)" 5160+,"78+.+72).69+0-1" :).4)"-;"4)1)".)<-.23" =)1)".)1+>)" ?@<.)336-1"61"+"*6,AB2C<)"/+7D4.-E1A" §   Correct   G-..)72" ?@<.)336-1"61"+">E2+12"/+7D4.-E1A" F8)1-2C<67"+1+,C363" §   False  posi8ve   K+,3)"<-360L)" K+,3)"1)4+0L)" F8C367+,"612).+70-1" §   False  nega8ve   (-2"<.)3)12" G8+14)3"2-"HI">),+1-4+32)."4)1)">-A)," §   Not  present   G8+14)3"2-"1-1BHI">),+1-4+32)."4)1)">-A)," :+<<614"-;";)+2E.)3"2-"4)1->)" G63B.)4E,+2-.C"),)>)123"A)J1)A" !" #!!" $!!" %!!" &!!" !!!" #!!" !"#$%&()*+*%&, Number  of  papers  
  29. 29. Results:  under-­‐flagging   ()*"+,,),)"-."+/)..+0-1" ()*"2.+134)1)" 5160+,"78+.+72).69+0-1" :).4)"-;"4)1)".)<-.23" =)1)".)1+>)" ?@<.)336-1"61"+"*6,AB2C<)"/+7D4.-E1A" §   Correct   G-..)72" ?@<.)336-1"61"+">E2+12"/+7D4.-E1A" F8)1-2C<67"+1+,C363" §   False  posi8ve   K+,3)"<-360L)" K+,3)"1)4+0L)" F8C367+,"612).+70-1" §   False  nega8ve   (-2"<.)3)12" G8+14)3"2-"HI">),+1-4+32)."4)1)">-A)," §   Not  present   G8+14)3"2-"1-1BHI">),+1-4+32)."4)1)">-A)," :+<<614"-;";)+2E.)3"2-"4)1->)" G63B.)4E,+2-.C"),)>)123"A)J1)A" !" #!!" $!!" %!!" &!!" !!!" #!!" !"#$%&()*+*%&, Number  of  papers  
  30. 30. Results:  under-­‐flagging   ()*"+,,),)"-."+/)..+0-1" ()*"2.+134)1)" 5160+,"78+.+72).69+0-1" :).4)"-;"4)1)".)<-.23" =)1)".)1+>)" ?@<.)336-1"61"+"*6,AB2C<)"/+7D4.-E1A" §   Correct   G-..)72" ?@<.)336-1"61"+">E2+12"/+7D4.-E1A" F8)1-2C<67"+1+,C363" §   False  posi8ve   K+,3)"<-360L)" K+,3)"1)4+0L)" F8C367+,"612).+70-1" §   False  nega8ve   (-2"<.)3)12" G8+14)3"2-"HI">),+1-4+32)."4)1)">-A)," §   Not  present   G8+14)3"2-"1-1BHI">),+1-4+32)."4)1)">-A)," :+<<614"-;";)+2E.)3"2-"4)1->)" G63B.)4E,+2-.C"),)>)123"A)J1)A" !" #!!" $!!" %!!" &!!" !!!" #!!" !"#$%&()*+*%&, Number  of  papers  
  31. 31. Results:  under-­‐flagging   ()*"+,,),)"-."+/)..+0-1" ()*"2.+134)1)" 5160+,"78+.+72).69+0-1" :).4)"-;"4)1)".)<-.23" =)1)".)1+>)" ?@<.)336-1"61"+"*6,AB2C<)"/+7D4.-E1A" §   Correct   G-..)72" ?@<.)336-1"61"+">E2+12"/+7D4.-E1A" F8)1-2C<67"+1+,C363" §   False  posi8ve   K+,3)"<-360L)" K+,3)"1)4+0L)" F8C367+,"612).+70-1" §   False  nega8ve   (-2"<.)3)12" G8+14)3"2-"HI">),+1-4+32)."4)1)">-A)," §   Not  present   G8+14)3"2-"1-1BHI">),+1-4+32)."4)1)">-A)," :+<<614"-;";)+2E.)3"2-"4)1->)" G63B.)4E,+2-.C"),)>)123"A)J1)A" !" #!!" $!!" %!!" &!!" !!!" #!!" !"#$%&()*+*%&, Number  of  papers  
  32. 32. •  Background  –  why  choose  triaging  ?  •  Community  cura8on  pipeline  •  Results  –  how  successful  were  we  ?  •  Future  plans  
  33. 33. Future  plans:  improving  the  response  rate  First  year’s  results   Author  skim  cura8on   44%   No  response  
  34. 34. Future  plans:  improving  the  response  rate  First  year’s  results   Author  skim  cura8on   44%   No  response  Sending  a  reminder  e-­‐mail  (since  mid-­‐Nov  2011)  
  35. 35. Future  plans:  improving  the  response  rate  First  year’s  results   Author  skim  cura8on   44%   No  response  Sending  a  reminder  e-­‐mail  (since  mid-­‐Nov  2011)   55%  
  36. 36. Future  plans:  triaging  the  remaining  papers   •  Text  mining  to  assign  data-­‐type  flags   •  See  poster  #P.109   •  “Integra8on  of  an  automa8c  triaging  step  into  FlyBase  Literature   Cura8on  through  the  use  of  SVM  text-­‐mining  methods.”    
  37. 37. Future  plans:  expanding  scope  of  community  cura8on   •  Exis8ng  pipeline   •  reviews   •   Wiki  pages   •  See  poster  #P.12   •  “Expanding  community  cura8on  at  FlyBase  through  the  design  and   implementa8on  of  a  gene-­‐centric  seman8c  wiki.”  
  38. 38. Acknowledgements  •  FB  community  cura8on  commiCee  -­‐  for  helping  improve   design  of  tool  •  FB-­‐Cambridge  curators  -­‐  for  helping  to  fully  curate  the  papers   analysed  for  accuracy  •  All  the  authors  who  have  filled  in  the  tool  !  

×