Engaging Scientific Communities in Contributing to a Biological Database

338 views

Published on

http://www.eresearch.org.nz/event/eresearch-nz-2013

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
338
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Engaging Scientific Communities in Contributing to a Biological Database

  1. 1. Engaging a Scientific Community in Contributingto a Biological DatabasePaul GardnerJune 21, 2013Paul Gardner Engaging Scientists
  2. 2. What is RNA?RNA is a fundamental biological molecule, essential for untoldbiological processesMy aim is to build an analog to the Periodic Table forclassifying RNA families and motifs, enabling researchers topredict function.New technologies are accelerating the rate of RNA discovery.base basepairRAUAGAU YACAUU5´YGAAR5´CUU CGG5´RUR R RY5´RRGCGUR ARAGCY5´RYGGAGYR RRRC RRGARR5´CGAAGYYRYY RRGGGRUGGAG5´CCRAYCCCRU CCGAACUYGG5´A N Y A G N R A U N C G T loop U t ur n k t r n1 k t r n2 tw istRCYRGGAACUGARCRUYAGUACGGGA R R A5´YYYAGUAG Y RAGGAARRR5´ RYGRYAAYCRYA YYAGRGAAYC5´RCAGG AGY5´ACAC UGRYRY G Y R R R RRYCARUY5´RAGCRCGRAGY AYGYYRGUUY5´AAAAAGCYRYY RRYGGYUUUUUUY U Y5´RRARR YYUUUUU U Y5´sar r ic1 sar r ic2 U A A G A N C sr C loop dom V t er m 1 t er m 2R Y Y Y YGCGAGCAGACGCARAACRCCCRRYRRYGGGYGUUYUGCGUCUGCUCGCR R R R5´YUYUCUCAACAGUGYUUGRRRAAY5´ YYYYYAUGAY GRYYYYAAA YYYYYRRGRRYC U GAUYYYRRR5´GGGUCUCUCUGYUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUARCUAGGGAACCCAC5´ UGUAAACAUCCUY GACUGGAAGCUGURRRY R YRRRRGCUUUCAGUCGGAUGUUUGC5´ U CUUUGGUUAUCUAGCUGUAUGAGUGYY RCRUCAUAAAGCUAGAUACCGAARU5´ CYYRUCCCUGAGACCCUAACYUGUGAGYUYYYAGYUUCACARGURGGYUCUYGGGRCYRGG5´GCUAAAAGGAACGAUCGUUGUGAUAUGCGUURRUUYCGUUACAUAUCACAGUGAUUUUCCUUUAUARCGC5´ C Y GYGYYCAUCUUACYGRGCAGUGUUGGAUGYYY RRGYCUCUAAYACUGYCUGGUAAYGAUGRCRYC G G5´ Y Y Y Y R R GYACAURCUUCUUUAUAUCCCAUAYRAYRRRCUAUGGAAUGUAAAGAAGUAUGUAYY Y G G Y5´ Y R R YYCRUCAAARUGGYUGUGAR UGUYRUCAUAUCACAGCCACUUUGAUGAGY U Y R R5´ Y A A RAAGGGAAYRGUUGCUGUGAURUAYYYA YYYYUYUAUAUCACAGUGGCUGUUCUUUUUG G U Y5´ YCRGGUGAGGUAGUAGGUUGUAUAGUURRRRYYYY YGGAGYAACURUACAAYCURCUACUUYCCUGR5´GGCUGGUCCGARRGUAGUGGGUUAYRUYAAYYYYUURYY Y YUCYCCCYCYCACU RCURYACUUGACURGCCUU U5´ YYYCUGYRRUGUCGUARYYYYYUGARCCRAYYYYYYGGGRGYYYYYRGGYAG CCCYYGGGAARCAARYRRRRYRCCC A CCURRRYRYRGGUUCARRRRYACGGCAYYRYGGRYYYY5´YYRCGRCCAUACRRRGRARCACCYGRUCCCAU CCGA ACYCRGAAGU UAAGCYYY YGGC YR RGUA C URG R YGRGRAYCCUGGGAARYRGGUGYYGYRRY5´GRUAGYYYARYGGY ARR R CRYYRGYUY AAYYRRRRYRRGG UUCRARUCCYYYYR5´RRAARYUCRYRRRRGYYACRRYGAGURYY RYRCUCYCYYYYG G G A A GGUC U G A GARGCCAYYRCCCUGGGGYRYYYYYYGRRRRGRRRR Y G R G Y YACCAGA A A YRR Y YYYRRGYUUGGAARRCUYRYGGCYRG Y R R Y UAGUCAAURYGRRYRRYYYRAACYCRAUUCAGACUAUCUYY5´T R I T I R E SE C I S m ir -T A R m ir -30 m ir -9 lin-4 m ir -5 m ir -8 m ir -1 m ir -2 m ir -6 let -7 Y R N A 6S 5S t R N A R N asePAURRGRYAGGYAUUGAACUGUAUUGUGCRCCUUGCAUARAGCUAAAGCACUAAAAAGGAGUAA5´AGUCAUGAUYGCUAUUCYY YAAAUAGUGAUUGUGAUAGCGAUGCGGYGUGUUG CGCACRYCGYAYCGCGC U5´AGAGGAARCRGGGGCCAYGCAGAAGCGUUCACGUCGCGGCCCCUGUCAGAUUCRGURAAUCUGCGAAUUCUGCU5´G A U ACAUAGGAACCUCCUC AAAGGAUUCUAUGGA C AGUCGAUGCAGGGAGGG A CRRCUCCCUGCAUCGGCGA U U U U5´ ACGRRGUR RARUGCGA U A A Y A YAAUAAUGAAAUUCCUCUU U G A CGGCCAAUAGCGAUAUUGGCCAUUUUUUU5´ RYCUUUAGCGGGYURRRUY A R U CURGYYGGYGUUUCGCCGRCY YURCYYUGAYRY5´RYYRYYCCGUGGUGAUUUGRYCGGCCGGCUUGCAGCCACGUUAAAYAAUCGCUAAARAGGCCGRGGRRR5´GUCGRRUY Y C ACUGA U G AG U C YU RARGACGAAAC5´ Y Y RAUYUAAARAAACA G CUU UCA AGU G CCU U U Y U GCA GUUYYYCARGAGCGCAAGAURGR U A5´RYGGYY GYUUGCCAUACGCCCYYY YYCGGC AGGUAUGGAARCACCCYCG Y A CGACUGGYYC GGACACYGYCGUCCCGCCAGAUC5´ CACAUCAGAU U UCCUGGUGUAA CGAAUUUUCAAGUGCU U CUUGCAUAAGCAAGUUURAUCCCGCYCCYYCGRGYCGGGAUUU5´ A U GGAGACAUGGCRUAAAGC C AGA RAG U R AGAACR U A A CYUAGACURUACUUGAACUGAUUYRCAUCUCAU U U U5´GCRCYGCAAAAUCRGRYGCC G G G AU UGGYAYCCCGRAYRRRRYRA R C G CYGCGYUUUUUU5´Y U R C G U G A C G A A G CGCGCGCAAAGUGGA CAAUAAAGCCURA G CRUYRAGUAGUCGYCAGACGCCGGUU A AGCCGGCGUUUU U U5´ YRYACGURYCYGUURURGYCCGGUUGCUUUGGUCGGUGACCGGRR R RRAGCCCRCUU GGUGGGYUUUU U5´GGYCRGCYCRCCC C CCRGRGCYGRCCG A C G G C C C C C G CU CCCCCCYGGCGGGGGYCGUCCCYY5´U U G G C G A U R UUUUUGGUU GGAAUGUAGUGYYYUUAR C A C U AAA CGC U GCCA C AAAUAACCUGUCAGUUAUUUCAYCAAAAAU A A A5´RYYRYUGCCCUCYGGGCGUUUCCUCCCUAGACUUGGCYYYYRRGGCCUUUUUUUUYYY5´SA M V sym R C P E B 3 F inP sr oB m sr SA M a H H 3 V m nt n3 livK D sr A C A E SA R isr K sr oD isr B 6C r spL suhBUYGCAUCCGCYAAYCGGUYAG C C GU G UCG C GG AA GGUUY YYAACCAG C URYY U Y Y G RAACRRAGRRAGGUGAGCG5´UGAAAGACGCGCAUUUGUU A U C A U CAUCC CUGU YCAGAGAUGYAAUUUGGCCACAGYRYGUGGCCUUUUC5´* UUCUACUGACUCUUUUAAAAUAAUUAUUCAUUGGAGG U UUAAUAUGAAUAUAA A G G A U G A G CAU AUAGAAGCGUUUGCUCYUUGUUAGAUCRGUUAGUAGGAA5´G A U U UGGURRCUGCGCUCUUC UAAGCCAGUUACCCGGUUCAAARAUUG C CAGCUUYGAACCUUCGAAAAACCACCUY CRRGGUGGUUUUUUCGU5´R R R R R R R RCUCRUAUAAYYYCRRRAAUAUG GYY Y G R R AGUU U C UACC R R G Y RC CGUAAAYRYYYGACUAYGAGRRR5´CGGCAUCCCCAUUA C CUAUGG ACACGGUGCCGC A R G C U C U G G R AG UUCGUYCCRGAGYYUGYYGGAARGGUUUUCCGUGUCCAG5´RRYGGARGCRRUGARYRYYYYUYAUYUG G GCACYUGRRRYRYGGAGCYAGU R GUGCAACCGRCCRYRRR5´GUUGUAACUAUGUUGCARYAR A C G AGAACCGAGUAUAGUUCAUGGGRU Y ACAUGAAUU G U UUAACURUCCUCUGGAU UCCCGUCCAUGRCAGUCGGUUC5´CUUACUGAGAGCACAAAGUUUCCCGUGCCAACAGGGAGUGUUAUAACGGUUUAUUAGUCUGGAGACGGCAGACUAUCCUCUUCCCGGUCCCCUAUGCCGGGUUUUUUUUAUGUC5´UURGRYUYRCCUGAAUGUGACUAUCACUUCAAACRRYGRGYAACCUCAGUAUCAUCRYRGAGYUAAACCCUCGCCGCCUGACGGYGAGGGUUUUCUUUUGGR5´U G U A A A A A A C A U Y A U U UAGCGUGAYUUUCUAUCAACAGC U A A CAAUUGUUAUUACUGCCUAAYGYUCAUAA G G G U A AUUUUAAAAAAGGG CGAUAAAAAACGAUUG G GGGAUGAGAYAUGAACGCUCA A G C A5´C C C A G A G G U A U U G A UUGGUGAURRCAYYU C URUGYUYAUUYAUURCACCAA C C U G C G C RGAUGCGCAGGUUUUUUUU5´ARRRYYYYYAAURYCAACYUUUAGCGCACGGCUCUYYAAGAGCCAUUYCCCUAGRCCAAACAGGAAUYGUUUGGYCUUUUUUU5´GGGCARGAUAUGUGAAGURGCYACCGCAAGCYGRUACYCUUCACYY Y C CUUA U UCG CUYGCUCAACGGRAUCYUGCUCUG C G A G G C Y5´GUGCRRYCYRAUUYYRGYYGYGCCYRYRARAACAUCAYAARAUACGGCRCRRCCACRAUUUCCCUGGUGUUGGCGCAGUAUUCGCGCACCCCGGUCUACC5´YUUYRYURRUUUYAUCARAYCU GUUUGAURRAAGYUARYGARR Y Y C A Y UAACRGCUYUYGCY GGCY Y GACCCGAGRYYGUUUU U U U5´RACGUUCAYCCYYYRGGRCGCAYRAYCARRYCAYGGAACGGGGRYYUGRR5´sucA Sr aD sxy R N A I P ur ine SA M -C hl cdiG M P 2 A nt i-Q G adY r nk ldr P r fA O m r A -B R yeB t r aJ 2 Sr aH 23Sm et h D S-p epU U C G G C C Y CGCRRCGYUU YUYCGYYGCCC U C U G C A YGCCGUCGCCGACGCAYUCCYAUUCGAA Y Y G UGCGAUCCUGUCGC CYUCCUGCGGCGCGGC5´ CGYRGCGCUUGUUAU UURYYG C UGUGUAG U GUCGUCYYRA R Y Y R G R R Y Y YAAACCCCGCCYUUYGGCGGGGUUUUGC U U U U U5´** CUUACCGGAGGYRUAUGGACCCUGA UCC C ACY C C UCUCCCC GAUGGAGAAUYYYUUUCCGGUAAGCC Y G Y C U Y YRCUGYYUUACCGG UGYGUAAGGCAGUGA C G U Y U5´GGRAGRYRYCUGGU G RYCGGCUUCA AACCGRY GRRGYRYYYYGGYRGGUUCGAYUCCYRYYCUYCC5´ UGACCCUUUA RCCRAGGGUCACC U A G C C A A C U G A C GUUGUUAGUGAAYYYAUGUUCAC ARAUARGCCAAUCGCUUUGCGRUUGGCU U U U U U U U U5´ C U U A A URAACAAGAAAACYAAR C GUACYUUCCY CCUGAGUUCAGGCUGGAAUGCGC ACAGC U RAU U G U U G A U AAG G G CUACUCAUACCGACAAGCCAGUGAAGCGAUGAAUGUCGGUUCCA C5´RUYYRCUGAYGAGUCCCAA AUAGGACGAAA C G CGCGUCYGRAU5´ CUCCAUGUAUCUUUGGGACCUGUCAGCUGUGGCAG UCUCCC UUCCUAGCCAUGGAAG A G C A U A U U C UUGUUUAUUGGCAAAGCUGUCACCAUUURAUUGGUAUCAGA U UCUGACUUGCACAAGUAACAUU C5´ C Y G G U U GGUGGCGCACUUCCYYACGGGCGGUGU RUYACGY R Y U R Y R R Y A G A R R R A Y A C CAGCCCGCYRRRAGCGGGCUUU U U U5´GUCAUACUACGGUGCAAYGYRRAAAGU AAACGAUGACC C YARGAACUCYRGG U AA AAURCRUAUCAAAAUGYAAAAUUGUY U G A C C U G G GRUYYUCCGGGUYRGYUYUUUU5´U R U G C U A A C U R R R A A YGUUGYA URYAACCCUUGRYGCUUAU YCCUUURYCAAGCA U A U U A Y ARCGRUCGYYAA A G G A G A A A U G5´U C R A A A G A A C AUGAAAUGGAGGAGAAAUUACAGCA A U U UAUCARC UGAAAUUAUAGGUGUAGACAC AUGUCAGCR G UGGAAACAGUUUC U AUCA A A A UUA A AGUAUUUAGAGAUUUUCCUC AAAUUUCAAA U5´ACAGGGUARGGRYYYYYUURURRRRRYCCUUACCGGRUUUCUCAARUYGGRGYAAAYCCGRUUGRARUAUARAGGARG5´CGYGUUAUAUGCCUUUAUUGUCACARUUYUUUUUYYGYUGRYCAUUGGYAYYAUURAUUYCCAGCRAUAAAYGACAAGCCCGAACRYUGUUCGGGCUUUUUUUURRUYA5´Y Y Y AUGGYGGYGRGGGRRCCUUYG GG YYGCCGGUUCCYYRCCGGU Y U RCCAACCCYYRCYRCCACC Y5´AUGGAYRUGCGCAGGAAGCGCRAAGACARACAGGGACACRYAGGRACCCGGAUGGYGGRRYAGGAUGUCAGGRAACAGUCUGCAAAGCCCCGCYYYGGCGGGGUUUU5´P s-R ho r nk ps M gsens t R N A S Q r r isr C H H 1 SN R 24 T r p ldr gr eA pr eQ 12 H A R 1F T er m L eu M icC C 4 R sm Y R ib osom ePaul Gardner Engaging Scientists
  3. 3. What is Rfam?A database of ncRNA alignments and structuresUsed for annotating RNAs in genome sequences, bioinformaticalgorithm development and molecular evolutionary analysesGardner et al. (2008) Rfam: updates to the RNA families databaseNucleic Acids Research.Paul Gardner Engaging Scientists
  4. 4. How can we keep textual descriptions of RNAs up to date?AC RF00005ID tRNACC Transfer RNA (tRNA) molecules are approximately 80 nucleotides inCC length. Their secondary structure includes four shortCC double-helical elements and three loops (D, anti-codon, and TCC loops). Further hydrogen bonds mediate the characteristicCC L-shaped molecular structure. tRNAs have two regions ofCC fundamental functional importance: the anti-codon, which isCC responsible for specific mRNA codon recognition, and the 3’ end,CC to which the tRNAs corresponding amino acid is attached (byCC aminoacyl-tRNA synthetases). tRNAs cope with the degeneracy ofCC the genetic code in two manners: having more than one tRNA (withCC a specific anti-codon) for a particular amino acid; and ’wobble’CC base-pairing, i.e. permitting non-standard base-pairing at theCC 3rd anti-codon position.RN [1]RM 8256282RT The tertiary structure of tRNA and the development of the geneticRT code.RA Hou YM;RL Trends Biochem Sci 1993;18:362-364.RN [2]RM 9023104RT tRNAscan-SE: a program for improved detection of transfer RNA genesRT in genomic sequence.RA Lowe TM, Eddy SR;RL Nucleic Acids Res 1997;25:955-964.Paul Gardner Engaging Scientists
  5. 5. This Wikipedia thing looks pretty good!Paul Gardner Engaging Scientists
  6. 6. WikiProject RNAThe WikiProjects are social corners of Wikipedia for interestedparties to discuss themed articlesInvolved in reviewing, ranking and rating articlesNow rolled into the larger WikiProject Molecular and CellularBiologyPaul Gardner Engaging Scientists
  7. 7. How has the Wikipedia experiment gone?x x x xxxx xxxx x x x x x xx x x x x x x xx xxx x x x x x x x x xxx xxx x x0200040006000800010000Number of Rfam pages editedYearNumberofedits2007 2008 2009 2010 20119089x x xxxxxxxxxxx xxxxxxxxxxxx xxxxx xx x106Total editsVandalismGardner et al. (2011) Rfam: Wikipedia, clans and the “decimal”release Nucleic Acids Research.Paul Gardner Engaging Scientists
  8. 8. Who are these Wikipedians donating their time?RfambotPpgardneCitationbot1WillowWSmackBotDOI_botAddbotAlexbatemanJebus989JenniferRfmZashawRjwilmsiQwyrxianYobotRE73NarayaneseRichFarmbroughAddshoreWgscottMiRroarRjwilmsiBotArcadianDO11.10GortonkBanusDrmed36FrescoBotBoghogTop 20 Rfam wikiproject editorsNumberofedits02004006008001000BotsProof ReadersScientistsPaul Gardner Engaging Scientists
  9. 9. What incentives can we give to Academics?Academics love publishing articlesIntroducing the “families track” at RNA BiologyPublication requirements are an alignment & a Wikipediaarticle100s of new families have been added thanks to this trackPaul Gardner Engaging Scientists
  10. 10. Who else is now using this model?Finn, Gardner, Bateman (2012) Making your database availablethrough Wikipedia: the pros and cons Nucleic Acids Research.Paul Gardner Engaging Scientists
  11. 11. Wikipedia need you!What is the highest impact contribution academics can make?Rule 1: Register an AccountRule 2: Learn the Five PillarsENCYC, NPOV, FREE, RESPECT, NORULESRule 3: Be Bold, but Not RecklessRule 4: Know Your AudienceRule 5: Do Not Infringe Copyright...Paul Gardner Engaging Scientists
  12. 12. Who might be reading about your field?Paul Gardner Engaging Scientists
  13. 13. Thanks!The Rfam ConsortiumWikipedians & the longtail!PPG is supported by a Rutherford Discovery Fellowship from Government funding, administered by the RoyalSociety of New Zealand.Paul Gardner Engaging Scientists

×