Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 12
Séquences
nucléiques, protéiques
PP Université Paris Diderot - Paris 7 13
Format Fasta
Le plus simple
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
PP Université Paris Diderot - Paris 7 14
Fasta
>en-tête
séquence sur 80 caractères maximum par ligne
séquence sur 80 caractères maximum par ligne
séquence sur 80 caractères maximum par ligne
séquence sur 80 caractères maximum par ligne
séquence sur 80 carac
PP Université Paris Diderot - Paris 7 15
Remarques
> colle en-tête
longueur de chaque ligne fixée
extensions .fasta, .seq, .fas, .fna, .faa
Python : chaînes de caractères + listes
+ (biopython)
PP Université Paris Diderot - Paris 7 16
Multifasta
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
>gi|134252438|gb|ABO64984.1| cytochrome b [Elephantulus rupestris]
TAFSSVTHICRDVNYGWLIRYLHANGASLFFICLFIHVGRGIYYGSYLYFETWNIGVILLFITMATAFMG
YVLPWGQMSFWGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFL
HETGSNNPLGLVSDSDKIPFHPYYTIKDLLGVFAILILHLSLVLFSPDLLGDPDNYTPANPLNTPPHIKP
EWYFLFAYAILRSIPNKLGGVLALVLSILILIIFPLLHTSKQRSLMFRPISQCLFWVLVADLLTLTWIGG
QPVEHPYIIIGQLASILYFTIILVLMPIAGVIENHIIKL
>gi|157367467|gb|ABV45600.1| cytochrome b [Mammuthus primigenius]
MTHIRKSHPLLKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTMTAFSSMSHIC
RDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSF
WGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFALHFILPFTMIALAGVHLTFLHETGSNNPLG
LTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAI
LRSVPNKLGGILALLLSILILGMMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEHPYIII
GQMASILYFSIILAFLPIAGMIENYLIK
PP Université Paris Diderot - Paris 7 17
Bases de données de séquences
primaires
GenBank – EMBL – DDBJ
PP Université Paris Diderot - Paris 7 18
Exemple
LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010
DEFINITION Homo sapiens trypsin X3 (TRYX3), mRNA.
ACCESSION NM_001001317
VERSION NM_001001317.2 GI:170650697
[...]
FEATURES Location/Qualifiers
source 1..940
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/chromosome="7"
/map="7q34"
gene 1..940
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/note="trypsin X3"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
[...]
ORIGIN
1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact
61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat
[...]
781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt
841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata
901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa
//
PP Université Paris Diderot - Paris 7 22
Exemple
LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010
DEFINITION
ACCESSION
VERSION
Homo sapiens trypsin X3 (TRYX3), mRNA.
NM_001001317
NM_001001317.2 GI:170650697
en-tête
[...]
FEATURES Location/Qualifiers
source 1..940
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/chromosome="7"
gene
/map="7q34"
1..940
features
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/note="trypsin X3"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
[...]
ORIGIN
1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact
[...]
61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat
séquence
781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt
841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata
901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa
//
PP Université Paris Diderot - Paris 7 23
En-tête
LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010
| | | | |
nom taille type de division date de
molécule modification
ACCESSION NM_001001317
|
numéro d'accession (unique et stable)
SOURCE Homo sapiens (human)
|
nom de l'organisme
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
|
taxonomie
REFERENCE 1 (bases 1 to 940)
AUTHORS Bubb,K.L., Bovee,D., Buckley,D., Haugen,E., Kibukawa,M.,
Paddock,M., Palmieri,A., Subramanian,S., Zhou,Y., Kaul,R., Green,P.
and Olson,M.V.
TITLE Scan of human genome reveals no new Loci under ancient balancing
selection
JOURNAL Genetics 173 (4), 2165-2177 (2006)
PUBMED 16751668
|
référence bibliographique
PP Université Paris Diderot - Paris 7 24
Features
début et fin du gène
| nom du gène
gene 1..940 |
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/note="trypsin X3"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
|
identifiants d'autres bases de données
séquence codante début et fin
| |
CDS 110..835
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/EC_number="3.4.21.4"
/note="trypsin-X3" nom de la protéine produite
/codon_start=1 |
/product="trypsin-X3 precursor"
/protein_id="NP_001001317.1"
/db_xref="GI:48255915"
/db_xref="CCDS:CCDS5871.1"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
/translation="MKFILLWALLNLTVALAFNPDYTVSSTPPYLVYLKSDYLPCAGV
LIHPLWVITAAHCNLPKLRVILGVTIPADSNEKHLQVIGYEKMIHHPHFSVTSIDHDI
MLIKLKTEAELNDYVKLANLPYQTISENTMCSVSTWSYNVCDIYKEPDSLQTVNISVI
SKPQCRDAYKTYNITENMLCVGIVPGRRQPCKEVSAAPAICNGMLQGILSFADGCVLR
ADVGIYAKIFYYIPWIENVIQNN"
|
séquence de la protéine
PP Université Paris Diderot - Paris 7 25
Exemple
ID TRY3_HUMAN Reviewed; 304 AA.
AC P35030; A9Z1Y4; P15951; Q15665; Q5VXV0; Q9UQV3;
DT 01-FEB-1994, integrated into UniProtKB/Swiss-Prot.
DT 14-OCT-2008, sequence version 2.
DT 11-JAN-2011, entry version 111.
DE RecName: Full=Trypsin-3;
DE EC=3.4.21.4;
DE AltName: Full=Brain trypsinogen;
DE AltName: Full=Mesotrypsinogen;
[...]
CC -!- FUNCTION: Digestive protease specialized for the degradation of
CC trypsin inhibitors.
CC -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa.
CC -!- COFACTOR: Binds 1 calcium ion per subunit.
[...]
DR PIR; S33496; S33496.
DR RefSeq; NP_002762.2; NM_002771.3.
DR UniGene; Hs.654513; -.
DR PDB; 1H4W; X-ray; 1.70 A; A=81-304.
[...]
FT DISULFID 196 263
FT DISULFID 228 242
FT DISULFID 253 277
[...]
SQ SEQUENCE 304 AA; 32529 MW; 4C4303C310B7BFFC CRC64;
MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL LGGRTWRAAR
DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH
CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV
STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA SYPGKITNSM
FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY NYVDWIKDTI
AANS
//
PP Université Paris Diderot - Paris 7 33
Détails
ID TRY3_HUMAN Reviewed; 304 AA.
| | |
nom origine : Swiss-Prot taille
DT 01-FEB-1994, integrated into UniProtKB/Swiss-Prot.
DT 14-OCT-2008, sequence version 2.
DT 11-JAN-2011, entry version 111.
|
dates d'entrée dans UniProt, de modification de la séquence, de modification de la fiche
DE RecName: Full=Trypsin-3;
|
nom de la protéine
DE AltName: Full=Brain trypsinogen;
DE AltName: Full=Mesotrypsinogen;
DE AltName: Full=Serine protease 3;
DE AltName: Full=Serine protease 4;
DE AltName: Full=Trypsin III;
|
noms alternatifs
OS Homo sapiens (Human).
|
organisme
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC Catarrhini; Hominidae; Homo.
|
taxonomie
PP Université Paris Diderot - Paris 7 34
Détails (2)
RN [1]
RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS A AND B), AND VARIANT ALA-188.
RC TISSUE=Brain;
RX MEDLINE=94123994; PubMed=8294000; DOI=10.1016/0378-1119(93)90460-K;
RA Wiegand U., Corbach S., Minn A., Kang J., Mueller-Hill B.;
RT "Cloning of the cDNA encoding human brain trypsinogen and
RT characterization of its product.";
RL Gene 136:167-175(1993).
|
référence bibliographique
CC -!- FUNCTION: Digestive protease specialized for the degradation of
CC trypsin inhibitors.
CC -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa.
CC -!- COFACTOR: Binds 1 calcium ion per subunit.
CC -!- SUBCELLULAR LOCATION: Secreted.
|
annotations (fonction, localisation)
DR PIR; S12764; S12764.
DR PIR; S33496; S33496.
DR RefSeq; NP_002762.2; NM_002771.3.
DR UniGene; Hs.654513; -.
|
identifiants d'autres bases de données
PE 1: Evidence at protein level;
|
degré de confiance de l'existence (expression) de la protéine
PP Université Paris Diderot - Paris 7 35
Détails (3)
FT MOD_RES 211 211 Sulfotyrosine (By similarity).
FT DISULFID 87 217
FT DISULFID 105 121
[...]
FT STRAND 111 117
FT HELIX 119 121
|
annotations de la séquence
SQ SEQUENCE 304 AA; 32529 MW; 4C4303C310B7BFFC CRC64;
MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL LGGRTWRAAR
DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH
CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV
STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA SYPGKITNSM
FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY NYVDWIKDTI
AANS
|
séquence de la protéine
//
|
fin de la fiche
PP Université Paris Diderot - Paris 7 36
Remarques
extension .txt
également .xml
Python : chaînes de caractères/listes
+ expressions régulières
(+ module xml)
PP Université Paris Diderot - Paris 7 37
Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 39
Protein Data Bank (PDB)
structures : ADN, ARN, protéines, virus...
Rayons-X, RMN, cryo-microscopie électronique
PP Université Paris Diderot - Paris 7 40
Exemple
HEADER HYDROLASE (SERINE PROTEINASE) 26-OCT-81 2PTN
TITLE ON THE DISORDERED ACTIVATION DOMAIN IN TRYPSINOGEN.
TITLE 2 CHEMICAL LABELLING AND LOW-TEMPERATURE CRYSTALLOGRAPHY
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: TRYPSIN;
COMPND 3 CHAIN: A;
COMPND 4 EC: 3.4.21.4;
COMPND 5 ENGINEERED: YES
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: BOS TAURUS;
SOURCE 3 ORGANISM_COMMON: CATTLE;
SOURCE 4 ORGANISM_TAXID: 9913
KEYWDS HYDROLASE (SERINE PROTEINASE)
EXPDTA X-RAY DIFFRACTION
[...]
REMARK 2 RESOLUTION. 1.55 ANGSTROMS.
[...]
[...]
ATOM 273 N ALA A 55 6.294 11.611 25.982 1.00 9.30 N
ATOM 274 CA ALA A 55 6.778 12.670 25.099 1.00 9.30 C
ATOM 275 C ALA A 55 7.329 13.864 25.883 1.00 9.30 C
ATOM 276 O ALA A 55 6.747 14.218 26.934 1.00 9.30 O
ATOM 277 CB ALA A 55 5.636 13.154 24.190 1.00 9.30 C
ATOM 278 N ALA A 56 8.461 14.383 25.454 1.00 7.97 N
ATOM 279 CA ALA A 56 9.069 15.522 26.129 1.00 7.97 C
ATOM 280 C ALA A 56 8.143 16.740 26.167 1.00 7.97 C
ATOM 281 O ALA A 56 8.162 17.496 27.169 1.00 7.97 O
ATOM 282 CB ALA A 56 10.414 15.918 25.506 1.00 7.97 C
[...]
PP Université Paris Diderot - Paris 7 44
Coordonnées
PyMOL
Rasmol
VMD
...
Python
PP Université Paris Diderot - Paris 7 46
Coordonnées
ATOM 601 N LEU A 99 10.007 19.687 17.536 1.00 12.25 N
ATOM 602 CA LEU A 99 9.599 18.429 18.188 1.00 12.25 C
ATOM 603 C LEU A 99 10.565 17.281 17.914 1.00 12.25 C
ATOM 604 O LEU A 99 10.256 16.101 18.215 1.00 12.25 O
ATOM 605 CB LEU A 99 8.149 18.040 17.853 1.00 12.25 C
ATOM 606 CG LEU A 99 7.125 19.029 18.438 1.00 18.18 C
ATOM 607 CD1 LEU A 99 5.695 18.554 18.168 1.00 18.18 C
ATOM 608 CD2 LEU A 99 7.323 19.236 19.952 1.00 18.18 C
PP Université Paris Diderot - Paris 7 47
Remarques
plusieurs chaînes
plusieurs structures (RMN)
des trous (RX)
Python : chaînes de caractères (tranches) + listes
PP Université Paris Diderot - Paris 7 49
Plusieurs chaînes
ATOM 955 CD2 TYR A 117 28.547 16.730 59.818 1.00 34.54 C
ATOM 956 CE1 TYR A 117 26.512 14.828 59.696 1.00 34.81 C
ATOM 957 CE2 TYR A 117 28.117 16.089 60.985 1.00 35.96 C
ATOM 958 CZ TYR A 117 27.100 15.139 60.917 1.00 35.42 C
ATOM 959 OH TYR A 117 26.673 14.515 62.069 1.00 37.14 O
ATOM 960 OXT TYR A 117 25.735 19.061 58.351 1.00 32.81 O
TER 961 TYR A 117
ATOM 962 N ARG B 3 42.047 55.053 18.876 1.00 34.90 N
ATOM 963 CA ARG B 3 42.680 56.307 19.383 1.00 35.03 C
ATOM 964 C ARG B 3 43.365 56.041 20.722 1.00 33.56 C
ATOM 965 O ARG B 3 42.720 55.647 21.691 1.00 33.47 O
ATOM 966 CB ARG B 3 41.614 57.395 19.562 1.00 37.48 C
ATOM 967 CG ARG B 3 40.638 57.499 18.394 1.00 41.05 C
PP Université Paris Diderot - Paris 7 50
Plusieurs structures
MODEL 1
ATOM 1 N GLY A 1 11.935 -10.938 0.352 1.00 0.00 N
ATOM 2 CA GLY A 1 13.344 -10.643 0.600 1.00 0.00 C
ATOM 3 C GLY A 1 13.861 -9.576 -0.330 1.00 0.00 C
ATOM 4 O GLY A 1 14.929 -9.728 -0.931 1.00 0.00 O
[...]
ATOM 934 HB2 GLU A 60 9.981 7.744 1.905 1.00 0.00 H
ATOM 935 HB3 GLU A 60 10.321 6.103 2.451 1.00 0.00 H
ATOM 936 HG2 GLU A 60 12.152 6.972 3.824 1.00 0.00 H
ATOM 937 HG3 GLU A 60 11.700 8.597 3.310 1.00 0.00 H
TER 938 GLU A 60
ENDMDL
MODEL 2
ATOM 1 N GLY A 1 19.334 -6.988 0.864 1.00 0.00 N
ATOM 2 CA GLY A 1 18.296 -6.813 1.874 1.00 0.00 C
ATOM 3 C GLY A 1 18.000 -5.370 2.142 1.00 0.00 C
ATOM 4 O GLY A 1 18.677 -4.724 2.959 1.00 0.00 O
[...]
ATOM 934 HB2 GLU A 60 11.353 9.615 -0.439 1.00 0.00 H
ATOM 935 HB3 GLU A 60 13.095 9.643 -0.204 1.00 0.00 H
ATOM 936 HG2 GLU A 60 13.380 10.930 -2.203 1.00 0.00 H
ATOM 937 HG3 GLU A 60 11.654 10.817 -2.534 1.00 0.00 H
TER 938 GLU A 60
ENDMDL
PP Université Paris Diderot - Paris 7 51
Des trous
[...]
ATOM 7568 CB LYS B 72 -59.462-109.221 -72.440 1.00 31.64 C
ATOM 7569 CG LYS B 72 -58.524-109.915 -73.424 1.00 31.85 C
ATOM 7570 CD LYS B 72 -58.889-109.602 -74.868 1.00 32.02 C
ATOM 7571 CE LYS B 72 -58.174-110.533 -75.837 1.00 31.61 C
ATOM 7572 NZ LYS B 72 -58.629-110.335 -77.242 1.00 31.27 N
ATOM 7573 N GLY B 73 -61.309-106.416 -72.158 1.00 31.85 N
ATOM 7574 CA GLY B 73 -62.485-105.832 -71.510 1.00 30.84 C
ATOM 7575 C GLY B 73 -63.598-106.848 -71.303 1.00 29.65 C
ATOM 7576 O GLY B 73 -64.660-106.750 -71.920 1.00 28.85 O
ATOM 7577 N SER B 74 -63.354-107.820 -70.425 1.00 28.53 N
ATOM 7578 CA SER B 74 -64.301-108.911 -70.179 1.00 27.75 C
ATOM 7579 C SER B 74 -64.180-109.438 -68.754 1.00 26.72 C
ATOM 7580 O SER B 74 -65.113-110.041 -68.227 1.00 24.48 O
ATOM 7581 CB SER B 74 -64.070-110.058 -71.166 1.00 26.32 C
ATOM 7582 OG SER B 74 -64.505-109.716 -72.470 1.00 25.54 O
ATOM 7583 N GLN B 79 -62.682-105.888 -62.336 1.00 42.85 N
ATOM 7584 CA GLN B 79 -63.246-104.902 -63.248 1.00 42.57 C
ATOM 7585 C GLN B 79 -62.146-104.278 -64.103 1.00 42.60 C
ATOM 7586 O GLN B 79 -60.992-104.191 -63.681 1.00 42.45 O
ATOM 7587 CB GLN B 79 -63.996-103.819 -62.464 1.00 42.46 C
ATOM 7588 CG GLN B 79 -64.950-102.964 -63.300 1.00 42.30 C
ATOM 7589 CD GLN B 79 -66.093-103.764 -63.905 1.00 42.15 C
ATOM 7590 OE1 GLN B 79 -66.388-104.879 -63.472 1.00 42.18 O
ATOM 7591 NE2 GLN B 79 -66.743-103.194 -64.911 1.00 41.70 N
ATOM 7592 N VAL B 80 -62.514-103.846 -65.305 1.00 42.30 N
ATOM 7593 CA VAL B 80 -61.549-103.342 -66.275 1.00 42.03 C
ATOM 7594 C VAL B 80 -60.882-102.055 -65.796 1.00 42.42 C
ATOM 7595 O VAL B 80 -61.544-101.165 -65.260 1.00 43.09 O
[...]
PP Université Paris Diderot - Paris 7 52
Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 53
Quelques précautions
restez prudents / données
PP Université Paris Diderot - Paris 7 54
GenBank Z71230
LOCUS Z71230 124 bp DNA linear PLN 14-NOV-2006
DEFINITION Nicotiana tabacum chloroplast JLA region, sequence 2.
ACCESSION Z71230
VERSION Z71230.1 GI:1279604
KEYWORDS rpl2 gene; transfer RNA-His; trnH gene.
SOURCE chloroplast Nicotiana tabacum (common tobacco)
ORGANISM Nicotiana tabacum
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
asterids; lamiids; Solanales; Solanaceae; Nicotianoideae;
Nicotianeae; Nicotiana.
REFERENCE 1 (bases 1 to 124)
AUTHORS Goulding,S.E., Olmstead,R.G., Morden,C.W. and Wolfe,K.H.
TITLE Ebb and flow of the chloroplast inverted repeat
JOURNAL Mol. Gen. Genet. 252 (1-2), 195-206 (1996)
PUBMED 8804393
[...]
FEATURES Location/Qualifiers
source 1..124
/organism="Nicotiana tabacum"
/organelle="plastid:chloroplast"
/mol_type="genomic DNA"
/isolate="Cuban cahibo cigar, gift from President Fidel
Castro"
/db_xref="taxon:4097"
gene <1..11
/gene="rpl2"
PP Université Paris Diderot - Paris 7 55
GenBank NC_001610
LOCUS NC_001610 17084 bp DNA circular MAM 14-APR-2009
DEFINITION Didelphis virginiana mitochondrion, complete genome.
ACCESSION NC_001610
VERSION NC_001610.1 GI:5835037
DBLINK Project: 11806
KEYWORDS .
SOURCE mitochondrion Didelphis virginiana (North American opossum)
ORGANISM Didelphis virginiana
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Metatheria; Didelphimorphia; Didelphidae; Didelphis.
REFERENCE 1 (bases 1 to 17084)
AUTHORS Janke,A., Feldmaier-Fuchs,G., Thomas,W.K., von Haeseler,A. and
Paabo,S.
TITLE The marsupial mitochondrial genome and the evolution of placental
mammals
JOURNAL Genetics 137 (1), 243-256 (1994)
PUBMED 8056314
[...]
FEATURES Location/Qualifiers
source 1..17084
/organism="Didelphis virginiana"
/organelle="mitochondrion"
/mol_type="genomic DNA"
/isolate="fresh road killed individual"
/db_xref="taxon:9267"
/tissue_type="liver"
/dev_stage="adult"
PP Université Paris Diderot - Paris 7 56
Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 65
Références
Cours de J.-C. Gelly Bases de données en biologie
Bioinformatics for dummies de J.-M. Claverie et C. Notredame
BioStar
Incorrect / unusual entries in main databases (GenBank, UniProt, PDB) ?
http://biostar.stackexchange.com/questions/10869/
incorrect-unusual-entries-in-main-databases-genbank-uniprot-pdb
PP Université Paris Diderot - Paris 7 66
Références (2)
format FASTA – http://en.wikipedia.org/wiki/FASTA_format
GenBank – http://www.ncbi.nlm.nih.gov/
format :
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
UniProt – http://www.uniprot.org/
format : http://www.uniprot.org/manual/
PDB – http://www.rcsb.org/pdb/home/home.do
format :
http://www.wwpdb.org/documentation/format23/v2.3.html
PP Université Paris Diderot - Paris 7 67