DSPy a system for AI to Write Prompts and Do Fine Tuning
daTAA server
1. Server daTAA: http://toolkit.tuebingen.mpg.de/dataa Paweł Szczęsny MPI for Developmental Biology, Tuebingen, Germany Institute of Biochemistry and Biophysics PAS, Warsaw, Poland
2. Internal complexity of TAAs MFWMCFVIFFIGEFIMKKLSVTSKRQYNLYASPISRRLSLLMKL SLETVTVMFLLGASPVLA / SNLALTG AKNLSQNSPGVNYSKGSHGSIVLSGDDDFCGADYVLGRGGNSTVRNGIPISVEEEYERFVKQKLMNNATSPYSQSSEQQVWTGDGLTSKGSGYMGGKSTDGDKNIL PE A Y G IY------------------------- SFATG CG S S A Q G NY------------------------- SVAFG AN A T A L T GG------------------------- S Q AFG VA A L A S G RV------------------------- SVAIG VG S E A T G EA------------------------- GVSLG GL S K A A G AR------------------------- SVAIG TR A N A Y G EE------------------------- SIAIG GGLKQGSDNKIGS A V A Q G LK------------------------- AISIG SD S V G FQHY------------------------- AVAIG AK S R A LLLK------------------------- SVALG SY S V A DVDAGVR GYDP VEDEPSKNVSFVWKSSVG AVSVG NRKEGLTRQ IIGVAAG---TEDTDAVNVAQL KALR:GMISEK|G GW NLTVNNDNNTVVSSGGALDLSSGSKNLKIAKDGKKNNVTFDVARDL TL KSIKLDGVTLNETGLFIANGPQITAS GIN AGSQK ITGVAEG---TDANDAVNFGQL ----------------------------------------------------------------------------------- KKI|ETEVKE -----QVA A SGFV KQD SDTK: YLTIGKDTDGDTINIANNKSDKRT LMGIKEGDISKDSSEAITGSQ L FT T NQN V KT V SDN L QT A ATN I AK T FGG DAKYE-DGEWTAPTFKVKTVTGEGKE-EEKT YQNVADALAGV GSS I TN V Q-------NK V TEQ V NNA IT--KVE G DALL WSDEANAFVAR H EKSKLEKGASKATQENSK ITYLLDGDVSKDSTDAITGKQ L YSLGD--------------KIASY LGG NAKYE-NGEWTAPTFKVKTVKEDGKE-EEQT YHNVAAAFEGV GTS F TN V K-------NE I TKQ I NHL ----QSD D SAVV HYD KDDK- NGSINYASVTLGKGKDSAAVT LHNVAAGNIAKDSHDAINGSQ I YSLNE--------------QLATY FGG GAGYNKEGKWTAPTFTVKTVKEDGEE-EEKT YQNVAEALTGV GTS F TN I K-------SE I TKQ I ANE IS--NVT G DSLV KKD LDTN LITIGKEVAGTEINIASVSKADRT LSGVKEA---VKDNEAVNKGQ L ------------------------ --- ------------------------------------------ --- - -- - ---------- - DKG L KHL SDSLQSE D SAVV HYD KKTDE TGGINYTSVTLG-GKDKTPVA LHNVADGSISKDSHDAINGGQ I HTIGE--------------DVAKF LGG AASFN-NGAFTGPTYKLSNIDAKGDV-QQSE FKDIGSAFAGL DTN I KN V NNN V TNK F NE L TQN I TNV TQ--QVK G DALL WSDEANAFVAR H EKSKLGKGASKATQENSK ITYLLDGDVSKDSTDAITGKQ L YSLGD--------------KIASY LGG NAKYE-DGEWTAPTFKVKTVKEDGKE-EEKT YQNVAEALTGV GTS F TN V K-------NE I TKQ I NHL ----QSD D SAVV HYD KNKDE TGGINYASVTLGKGKDSAAVT LHNVADGSISKDSRDAINGSQ I YSLNE--------------QLATY FGG GAKYE-NGQWTAPIFKVKTVKEDGEE-EEKT YQNVAEALTGV GTS F TN I K-------SE I TKQ I ANE IS--SVT G DSLV KKD LATN LITIGKEVAGTEINIASVSKADRT LSGVKEA---VKDNEAVNKGQ L ------------------------ --- ------------------------------------------ DTN I KK V E-------DK L TEA V GKV TQ--QVK G DALL WSNEDNAFVAD H GKDSAKTKSK ITHLLDGNIASGSTDAVTGGQ L YSLNE--------------QLATY FGG GAKYE-NGQWTAPTFKVKTVNGEGKE-EEQT YQNVAEALTGV GAS F MN V QNK I T---NE I TNQ V NNA IT--KVE G DSLV KQD NLG- IITLGKERGGLKVDFANRDGLDRT LSGVKEA---VNDNEAVNKGQ L ------------------------ --- ------------------------------------------ DAD I SK V NNN V TNK F NE L TQN I TNV TQ--QVK G DALL WSDEANAFVAR H EKSKLEKGVSKATQENSK ITYLLDGDISKGSTDAVTGGQ L YSLNE--------------QLATY FGG DAKYE-NGQWTAPTFKVKTVNGEGKE-EEQT YHNVAAAFEGV GTS F TN I K-------SE I TKQ I NNE IS--NVK G DSLV KKD LATN LITIGKEVAGTEINIASVSKADRT LSGVKEA---VKDNEAVNKGQ L ------------------------ --- ------------------------------------------ DTN I KK V E-------DK L TEA V GKV TQ--QVK G DALL WSNEDNAFVAD H GKDSAKTKSK ITHLLDGNIASGSTDAVTGGQ L YSLNE--------------QLATY FGG GAKYE-NGQWTAPTFKVKTVNGDGKE-EEQT YQNVAEALTGV GTS F TN V QNK I T---NE I TNQ V NNA IT--KVE G DSLV KQD NLG- IITLGKERGGLKVDFANRDGLDRT LSGVKEA---VNDNEAVNKGQ L ------------------------ --- ------------------------------------------ DAN I SK V NNN V TNK F NE L TQN I TNV TQ--QVQ G DTLL WSDEANAFVAR H EKSKLEKGVSKATQENSK ITYLLDGDISKGSTDAVTGGQ L YSLNE--------------QLATY FGG GAKYE-NGEWTAPTFKVKTVNGEGKE-EEQT YHNVAAAFEGV GTS F TN I K-------SE I TKQ I DNE II--NVK G DSLV KRD LATN LITIGKEIEGSAINIANKSGEART ISGVKEA---VNNNEAVNKGQ L ------------------------ --- ------------------------------------------ DTN I KK V E-------DK L TEA V GKV TQ--QVK G DALL WSNEDNAFVAD H GKDSAKTKSK ITHLLDGNIASGSTDAVTGGQ L YSLNE--------------QLATY FGG GAKYE-NGQWTAPSFKVKTVKEDGKE-EEQT YQNVAEALTGV GTS F TN V K-------NE I TKQ I NHL ----QSD D SAVV HYD KNKDE TGTINYASVTLGKGKDSAAVT LHNVADGSISKDSRDAINGGQ I HTIGE--------------DVAKF LGG DAAFK-DGAFTGPTYKLSNIDAKGDV-QQSE FKDIGSAFAGL DTN I KN V NNN V TNK F NE L TQS I TNV TQ--QVK G DSLL WSDEANAFVAR H EKSKLEKGASKAIQENSK ITYLLDGNVSKGSTDAVTGGQ L YSMSN--------------MLATY LGG NAKYE-NGEWTAPTFKVKTVNGEGKE-EEQT YQNVAEALTGV GTS F TN I K-------SE I AKQ I NHL ----QSD D SAVI HYD KNKDE TGTINYASVTLGKGEDSAAVA LHNVAAGNIAKDSRDAINGSQ L YS L NE--------------Q L LTY FGG NAGYK-DGQWIAPKFQVSQFKSDGSSGEKES YDNVAAAFEGV NKS L AG M --------NERINN V VTA GQ--NVS S NSLN WNETEGGYDAR H NGVDSK LTHVENGDVSEKSKEAVNGSQ L WN T NEK V EA V EKD V KN I EKK V QD I ATVADSAVKYEKDSTGKKTNVIKLVGGSESDPVL IDNVADGDIKEGSKQAVNGGQ L RD YTE KQMKIVLEDAKK YTD ERFNDVVNNGVNEAKA YTD MKFEALSYAVEDVRKEARQA QLLVWRYLTYVTMIYRDL AAIGLAV SN LRYYDIPGS L S L S F G T G I WRSQSA F A V G A G Y TSED G N I R S N L S I TNAGGH W G V G A G I T L R L K
3. Automated vs manual annotation Coverage of annotation Domain type PFAM manually Present in PFAM 28% 35% Not present in PFAM - 18% Coiled coils - 3% Total 28% 56% Present in PFAM 26% 31% Not present in PFAM - 36% Coiled coils - 25% Total 26% 92%
4. Automated vs manual annotation Coverage of annotation Domain type PFAM daTAA manually Present in PFAM 28% 32% 35% Not present in PFAM - 13% 18% Coiled coils - 5% 3% Total 28% 50% 56% Present in PFAM 26% 28% 31% Not present in PFAM - 27% 36% Coiled coils - 11% 25% Total 26% 66% 92%