This document discusses data formats used in biology. It begins by reviewing the central dogma of biology relating DNA, RNA and proteins. It then discusses common sequence file formats like FASTA that are used to store and represent nucleotide and protein sequences. Key points are that FASTA is one of the simplest formats, with sequences stored in fixed-length lines and identifiers beginning with ">", and that multifasta files can contain multiple sequences. GenBank, EMBL and DDBJ are also introduced as major public databases for nucleotide and protein sequences.
Revue de parcours des pièges les plus classiques en PHP, entre les références qui pendouillent, les opérateurs et leur précédence, array_merge() en boucle, ou encore les fonctionnalités natives oubliées et les améliorations de PHP 8.0.
Presentation on (semantic) nanopublications (specifically in biomedicine) given at iExpo in Paris, 10 June 2010. Partially in French (but mostly with English translation).
(in Apple Keynote 09)
2013-02-21 - .NET UG Rhein-Neckar: JavaScript Best PracticesJohannes Hoppe
Of course, a presentation about JavaScript should be made with HTML5 & JavaScript. So, here it is! Enjoy the show at http://johanneshoppe.github.com/JsBestPractices/ . You might also want to fork it on GitHub (https://github.com/JohannesHoppe/JsBestPractices) or save it as an old-fashioned static PDF from Slideshare.
Revue de parcours des pièges les plus classiques en PHP, entre les références qui pendouillent, les opérateurs et leur précédence, array_merge() en boucle, ou encore les fonctionnalités natives oubliées et les améliorations de PHP 8.0.
Presentation on (semantic) nanopublications (specifically in biomedicine) given at iExpo in Paris, 10 June 2010. Partially in French (but mostly with English translation).
(in Apple Keynote 09)
2013-02-21 - .NET UG Rhein-Neckar: JavaScript Best PracticesJohannes Hoppe
Of course, a presentation about JavaScript should be made with HTML5 & JavaScript. So, here it is! Enjoy the show at http://johanneshoppe.github.com/JsBestPractices/ . You might also want to fork it on GitHub (https://github.com/JohannesHoppe/JsBestPractices) or save it as an old-fashioned static PDF from Slideshare.
Este es el Examen Práctico del Segundo Grado Grupo 1 de la Escuela Preparatoria Oficial Anexa a la Normal No1 de Toluca.
Responsable:
Mtro. Osvaldo Trujillo Ibarra.
Este es el Examen Práctico del Segundo Grado Grupo 1 de la Escuela Preparatoria Oficial Anexa a la Normal No1 de Toluca.
Responsable:
Mtro. Osvaldo Trujillo Ibarra.
The introduction of supernova system: a vector system for single-cell labelin...Div. of Neurogenet., NIG
Here, we introduce the “Supernova system”, which has been reported in the following two papers:
- NMDAR-Regulated Dynamics of Layer 4 Neuronal Dendrites during Thalamocortical Reorganization in Neonates. Mizuno et al., Neuron 2014.
- Supernova: A Versatile Vector System for Single-Cell Labeling and Gene Function Studies in vivo. Luo et al. Sci. Rep.2016.
Lab web site: https://www.nig.ac.jp/labs/NeurGen/
Supernova support site: http://snsupport.webcrow.jp/
contact: tiwasato(at)nig.ac.jp
Homo sapiens (human pepsin) NCBI GENBANKShreyaBhatt23
GenBank format and FASTA format as homo sapiens pepsin as an example bioinformatics practical 1st experiment ; sequence retrival from nucleotide sequence from NCBI
A Genome Sequence Analysis System Built with HypertableDATAVERSITY
Deep genome sequencing has revolutionized the fields of biology and medicine. Since January 2008, the capacity to generate sequence data has increased exponentially, far outpacing Moore's Law. The emergence of scalable NoSQL database technologies has made the analysis of this vast amount of sequence data not only feasible, but cost effective.
The University of California at San Francisco UCSF-Abbott Viral Detection and Discovery Center, led by director Charles Chiu, MD, PhD, Taylor Sittler, MD and the Hypertable development team have embarked upon a project to build a scalable software platform to facilitate deep sequencing analysis in diagnostic microbiology, transcriptomic analysis, and clinical / environmental metagenomics, areas for which existing commercial and academic solutions are sorely lacking. Doug Judd, the original creator of Hypertable, will present an overview of this genome sequencing analysis system. The presentation will cover the following topics:
Rationale for choosing NoSQL
Schema design
Sources and description of input data
Algorithms for generating and querying lookup tables
Table sizes and compression ratios
Lessons learned during system deployment
Name- Date- Parlod- Monster Synthesis Activity Eurpase To examine how.pdfactexerode
Name: Date: Parlod: Monster Synthesis Activity Eurpase To examine how an organism's DNA
determines their phenotypes. Backerouni. Information: Your unique body tharacteristics (tralts),
such as hair color or blood typo, are deternined by the proteins your body producex. Protrins are
the building block of life - in fact, about 45% of the human body is made of preterin. These
orkanic macremelecules perform a wide range of functions including body repatr, regulaten, and
protection. Proteins are created by bonding groups of amino aclds that are coded for by the
nucleotide base sequences ( A , T , G , and C ) in your DNA DNA is trapped in the nucleus
because it is too wide to escape through the small nuclear pores in the nuclear niemtrane This is a
big isnue for the cell, since proteins are made outside the nucleus in the cytoplasm. For dis
reason, a yrocess called transeription occurs. DNA passes on its nucleotide base sequences, or
code, to a singlestranded molecule called mRNA (messenger). mRNA then carries the code ouf
to the cyaplasin to the ribosomes, the site where proteins are made. When the miNA reaches the
ribosome, the code in the mRNA nucisotides are read in groups of three basos, or codons. It
takes three bases to code for a single amino acid. Each codon signals another type of RNA,
called thWA (transfee), to carry a specific amino acld into the ribosome. As amino acids
continue to bond to one another it forms a polypeptide chain that eventually results in a protein.
This process is known as translation. In thit ectiliv, woe will simulate protefn synthesis by
transerlbing the DNA and translating the mRNA of the imagiy ry UID Ol. monster You will
decode each gene to determinc the phenotgple expression of the cinvil? inaticic'a ONA, and then
draw the monster based on your resulte 1. Pick a, DNA anand fir MAIH: COMaE. Alele It TAC
ATA CGC GEC NTT Alieke 1 s. Tac HAT cod tea NTE Allele 7: taC ARA CEC GIA Nit.
Aleierlintuc tat cece areate Allele 3. TAC GCG CCC ANA ATt Mirir ia tive pur cie Gesta Arce.
Allele d- Fhe bCC Cif TTH Ats Alleie 13 TNe filt cors of ktes 2. Pick a DNA strand for EiY
ReOLD. k. rik a bSil strand far rum Tyy? Allele II TAC ATA CCC GGO ATT Allele 24 TAE
ATN CEC GIA AIT Alele 3: TAE GGGCEC A AL ATT: Allele 4r TAC fGe CnT TTI AIT 7.
Hirk mikh mrand for Howis 3. Pirk a DNA strand fot H4s Dsirt Mllele 5: TAC A.AA TIT gCC
ATL Aliele be IAC CAA CAT EAI ARC Alleit 7t JAC GTA GTE CET ATE 4. Pick a DNh
strand for Fatri STI Aliele 5- TACAAA TTT COC ETL Aliele fitTC CAS CAT CASATC Aliele
7 fAC A QTO GCT ATE 5. Plick a DNA moand for bifus Allele fi TAC ATI ERA TMA ATE
Alele LUTAENTA CRE CICNTT Instructions: 1. Pick out a DNA strand for each category a.
Write dowa the DNA strand and allele number on your "Monster Synthesis Data Sheet" b.
Transcribe each DNA strand into mRNA. c. Trenslate the mRNA strand into an ainino acld
sequence using the grnetic codan chart d. Write down the physiaial appearance (phenotype)
based on the amino ac.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
12. Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 12
13. Séquences
nucléiques, protéiques
PP Université Paris Diderot - Paris 7 13
14. Format Fasta
Le plus simple
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
PP Université Paris Diderot - Paris 7 14
15. Fasta
>en-tête
séquence sur 80 caractères maximum par ligne
séquence sur 80 caractères maximum par ligne
séquence sur 80 caractères maximum par ligne
séquence sur 80 caractères maximum par ligne
séquence sur 80 carac
PP Université Paris Diderot - Paris 7 15
16. Remarques
> colle en-tête
longueur de chaque ligne fixée
extensions .fasta, .seq, .fas, .fna, .faa
Python : chaînes de caractères + listes
+ (biopython)
PP Université Paris Diderot - Paris 7 16
17. Multifasta
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY
>gi|134252438|gb|ABO64984.1| cytochrome b [Elephantulus rupestris]
TAFSSVTHICRDVNYGWLIRYLHANGASLFFICLFIHVGRGIYYGSYLYFETWNIGVILLFITMATAFMG
YVLPWGQMSFWGATVITNLLSAIPYIGTTLVEWIWGGFSVDKATLTRFFAFHFILPFIIAALAMVHLLFL
HETGSNNPLGLVSDSDKIPFHPYYTIKDLLGVFAILILHLSLVLFSPDLLGDPDNYTPANPLNTPPHIKP
EWYFLFAYAILRSIPNKLGGVLALVLSILILIIFPLLHTSKQRSLMFRPISQCLFWVLVADLLTLTWIGG
QPVEHPYIIIGQLASILYFTIILVLMPIAGVIENHIIKL
>gi|157367467|gb|ABV45600.1| cytochrome b [Mammuthus primigenius]
MTHIRKSHPLLKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTMTAFSSMSHIC
RDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSF
WGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFALHFILPFTMIALAGVHLTFLHETGSNNPLG
LTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAI
LRSVPNKLGGILALLLSILILGMMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEHPYIII
GQMASILYFSIILAFLPIAGMIENYLIK
PP Université Paris Diderot - Paris 7 17
18. Bases de données de séquences
primaires
GenBank – EMBL – DDBJ
PP Université Paris Diderot - Paris 7 18
22. Exemple
LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010
DEFINITION Homo sapiens trypsin X3 (TRYX3), mRNA.
ACCESSION NM_001001317
VERSION NM_001001317.2 GI:170650697
[...]
FEATURES Location/Qualifiers
source 1..940
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/chromosome="7"
/map="7q34"
gene 1..940
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/note="trypsin X3"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
[...]
ORIGIN
1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact
61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat
[...]
781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt
841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata
901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa
//
PP Université Paris Diderot - Paris 7 22
23. Exemple
LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010
DEFINITION
ACCESSION
VERSION
Homo sapiens trypsin X3 (TRYX3), mRNA.
NM_001001317
NM_001001317.2 GI:170650697
en-tête
[...]
FEATURES Location/Qualifiers
source 1..940
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/chromosome="7"
gene
/map="7q34"
1..940
features
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/note="trypsin X3"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
[...]
ORIGIN
1 aaggctggca aaaaggagac cagacaggag gcgtctgtag agatatcatg aacttcaact
[...]
61 tagctttgtt ttccagagac tggagctaaa ctgggctttc aacatcatca tgaagtttat
séquence
781 tgccaaaatt ttttactata taccctggat tgaaaatgta atccaaaata actgagctgt
841 ggcagttgtg gaccatatga cacagcttgt ccccatcgtt cacctttaga attaaatata
901 aattaactcc tcaaaaaaaa aaaaaaaaaa aaaaaaaaaa
//
PP Université Paris Diderot - Paris 7 23
24. En-tête
LOCUS NM_001001317 940 bp mRNA linear PRI 27-DEC-2010
| | | | |
nom taille type de division date de
molécule modification
ACCESSION NM_001001317
|
numéro d'accession (unique et stable)
SOURCE Homo sapiens (human)
|
nom de l'organisme
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
|
taxonomie
REFERENCE 1 (bases 1 to 940)
AUTHORS Bubb,K.L., Bovee,D., Buckley,D., Haugen,E., Kibukawa,M.,
Paddock,M., Palmieri,A., Subramanian,S., Zhou,Y., Kaul,R., Green,P.
and Olson,M.V.
TITLE Scan of human genome reveals no new Loci under ancient balancing
selection
JOURNAL Genetics 173 (4), 2165-2177 (2006)
PUBMED 16751668
|
référence bibliographique
PP Université Paris Diderot - Paris 7 24
25. Features
début et fin du gène
| nom du gène
gene 1..940 |
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/note="trypsin X3"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
|
identifiants d'autres bases de données
séquence codante début et fin
| |
CDS 110..835
/gene="TRYX3"
/gene_synonym="FLJ16649; MGC35022; PRSS1; TRY1; UNQ2540"
/EC_number="3.4.21.4"
/note="trypsin-X3" nom de la protéine produite
/codon_start=1 |
/product="trypsin-X3 precursor"
/protein_id="NP_001001317.1"
/db_xref="GI:48255915"
/db_xref="CCDS:CCDS5871.1"
/db_xref="GeneID:136541"
/db_xref="HPRD:15572"
/translation="MKFILLWALLNLTVALAFNPDYTVSSTPPYLVYLKSDYLPCAGV
LIHPLWVITAAHCNLPKLRVILGVTIPADSNEKHLQVIGYEKMIHHPHFSVTSIDHDI
MLIKLKTEAELNDYVKLANLPYQTISENTMCSVSTWSYNVCDIYKEPDSLQTVNISVI
SKPQCRDAYKTYNITENMLCVGIVPGRRQPCKEVSAAPAICNGMLQGILSFADGCVLR
ADVGIYAKIFYYIPWIENVIQNN"
|
séquence de la protéine
PP Université Paris Diderot - Paris 7 25
33. Exemple
ID TRY3_HUMAN Reviewed; 304 AA.
AC P35030; A9Z1Y4; P15951; Q15665; Q5VXV0; Q9UQV3;
DT 01-FEB-1994, integrated into UniProtKB/Swiss-Prot.
DT 14-OCT-2008, sequence version 2.
DT 11-JAN-2011, entry version 111.
DE RecName: Full=Trypsin-3;
DE EC=3.4.21.4;
DE AltName: Full=Brain trypsinogen;
DE AltName: Full=Mesotrypsinogen;
[...]
CC -!- FUNCTION: Digestive protease specialized for the degradation of
CC trypsin inhibitors.
CC -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa.
CC -!- COFACTOR: Binds 1 calcium ion per subunit.
[...]
DR PIR; S33496; S33496.
DR RefSeq; NP_002762.2; NM_002771.3.
DR UniGene; Hs.654513; -.
DR PDB; 1H4W; X-ray; 1.70 A; A=81-304.
[...]
FT DISULFID 196 263
FT DISULFID 228 242
FT DISULFID 253 277
[...]
SQ SEQUENCE 304 AA; 32529 MW; 4C4303C310B7BFFC CRC64;
MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL LGGRTWRAAR
DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH
CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV
STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA SYPGKITNSM
FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY NYVDWIKDTI
AANS
//
PP Université Paris Diderot - Paris 7 33
34. Détails
ID TRY3_HUMAN Reviewed; 304 AA.
| | |
nom origine : Swiss-Prot taille
DT 01-FEB-1994, integrated into UniProtKB/Swiss-Prot.
DT 14-OCT-2008, sequence version 2.
DT 11-JAN-2011, entry version 111.
|
dates d'entrée dans UniProt, de modification de la séquence, de modification de la fiche
DE RecName: Full=Trypsin-3;
|
nom de la protéine
DE AltName: Full=Brain trypsinogen;
DE AltName: Full=Mesotrypsinogen;
DE AltName: Full=Serine protease 3;
DE AltName: Full=Serine protease 4;
DE AltName: Full=Trypsin III;
|
noms alternatifs
OS Homo sapiens (Human).
|
organisme
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC Catarrhini; Hominidae; Homo.
|
taxonomie
PP Université Paris Diderot - Paris 7 34
35. Détails (2)
RN [1]
RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORMS A AND B), AND VARIANT ALA-188.
RC TISSUE=Brain;
RX MEDLINE=94123994; PubMed=8294000; DOI=10.1016/0378-1119(93)90460-K;
RA Wiegand U., Corbach S., Minn A., Kang J., Mueller-Hill B.;
RT "Cloning of the cDNA encoding human brain trypsinogen and
RT characterization of its product.";
RL Gene 136:167-175(1993).
|
référence bibliographique
CC -!- FUNCTION: Digestive protease specialized for the degradation of
CC trypsin inhibitors.
CC -!- CATALYTIC ACTIVITY: Preferential cleavage: Arg-|-Xaa, Lys-|-Xaa.
CC -!- COFACTOR: Binds 1 calcium ion per subunit.
CC -!- SUBCELLULAR LOCATION: Secreted.
|
annotations (fonction, localisation)
DR PIR; S12764; S12764.
DR PIR; S33496; S33496.
DR RefSeq; NP_002762.2; NM_002771.3.
DR UniGene; Hs.654513; -.
|
identifiants d'autres bases de données
PE 1: Evidence at protein level;
|
degré de confiance de l'existence (expression) de la protéine
PP Université Paris Diderot - Paris 7 35
36. Détails (3)
FT MOD_RES 211 211 Sulfotyrosine (By similarity).
FT DISULFID 87 217
FT DISULFID 105 121
[...]
FT STRAND 111 117
FT HELIX 119 121
|
annotations de la séquence
SQ SEQUENCE 304 AA; 32529 MW; 4C4303C310B7BFFC CRC64;
MCGPDDRCPA RWPGPGRAVK CGKGLAAARP GRVERGGAQR GGAGLELHPL LGGRTWRAAR
DADGCEALGT VAVPFDDDDK IVGGYTCEEN SLPYQVSLNS GSHFCGGSLI SEQWVVSAAH
CYKTRIQVRL GEHNIKVLEG NEQFINAAKI IRHPKYNRDT LDNDIMLIKL SSPAVINARV
STISLPTTPP AAGTECLISG WGNTLSFGAD YPDELKCLDA PVLTQAECKA SYPGKITNSM
FCVGFLEGGK DSCQRDSGGP VVCNGQLQGV VSWGHGCAWK NRPGVYTKVY NYVDWIKDTI
AANS
|
séquence de la protéine
//
|
fin de la fiche
PP Université Paris Diderot - Paris 7 36
37. Remarques
extension .txt
également .xml
Python : chaînes de caractères/listes
+ expressions régulières
(+ module xml)
PP Université Paris Diderot - Paris 7 37
39. Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 39
40. Protein Data Bank (PDB)
structures : ADN, ARN, protéines, virus...
Rayons-X, RMN, cryo-microscopie électronique
PP Université Paris Diderot - Paris 7 40
44. Exemple
HEADER HYDROLASE (SERINE PROTEINASE) 26-OCT-81 2PTN
TITLE ON THE DISORDERED ACTIVATION DOMAIN IN TRYPSINOGEN.
TITLE 2 CHEMICAL LABELLING AND LOW-TEMPERATURE CRYSTALLOGRAPHY
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: TRYPSIN;
COMPND 3 CHAIN: A;
COMPND 4 EC: 3.4.21.4;
COMPND 5 ENGINEERED: YES
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: BOS TAURUS;
SOURCE 3 ORGANISM_COMMON: CATTLE;
SOURCE 4 ORGANISM_TAXID: 9913
KEYWDS HYDROLASE (SERINE PROTEINASE)
EXPDTA X-RAY DIFFRACTION
[...]
REMARK 2 RESOLUTION. 1.55 ANGSTROMS.
[...]
[...]
ATOM 273 N ALA A 55 6.294 11.611 25.982 1.00 9.30 N
ATOM 274 CA ALA A 55 6.778 12.670 25.099 1.00 9.30 C
ATOM 275 C ALA A 55 7.329 13.864 25.883 1.00 9.30 C
ATOM 276 O ALA A 55 6.747 14.218 26.934 1.00 9.30 O
ATOM 277 CB ALA A 55 5.636 13.154 24.190 1.00 9.30 C
ATOM 278 N ALA A 56 8.461 14.383 25.454 1.00 7.97 N
ATOM 279 CA ALA A 56 9.069 15.522 26.129 1.00 7.97 C
ATOM 280 C ALA A 56 8.143 16.740 26.167 1.00 7.97 C
ATOM 281 O ALA A 56 8.162 17.496 27.169 1.00 7.97 O
ATOM 282 CB ALA A 56 10.414 15.918 25.506 1.00 7.97 C
[...]
PP Université Paris Diderot - Paris 7 44
46. Coordonnées
PyMOL
Rasmol
VMD
...
Python
PP Université Paris Diderot - Paris 7 46
47. Coordonnées
ATOM 601 N LEU A 99 10.007 19.687 17.536 1.00 12.25 N
ATOM 602 CA LEU A 99 9.599 18.429 18.188 1.00 12.25 C
ATOM 603 C LEU A 99 10.565 17.281 17.914 1.00 12.25 C
ATOM 604 O LEU A 99 10.256 16.101 18.215 1.00 12.25 O
ATOM 605 CB LEU A 99 8.149 18.040 17.853 1.00 12.25 C
ATOM 606 CG LEU A 99 7.125 19.029 18.438 1.00 18.18 C
ATOM 607 CD1 LEU A 99 5.695 18.554 18.168 1.00 18.18 C
ATOM 608 CD2 LEU A 99 7.323 19.236 19.952 1.00 18.18 C
PP Université Paris Diderot - Paris 7 47
49. Remarques
plusieurs chaînes
plusieurs structures (RMN)
des trous (RX)
Python : chaînes de caractères (tranches) + listes
PP Université Paris Diderot - Paris 7 49
50. Plusieurs chaînes
ATOM 955 CD2 TYR A 117 28.547 16.730 59.818 1.00 34.54 C
ATOM 956 CE1 TYR A 117 26.512 14.828 59.696 1.00 34.81 C
ATOM 957 CE2 TYR A 117 28.117 16.089 60.985 1.00 35.96 C
ATOM 958 CZ TYR A 117 27.100 15.139 60.917 1.00 35.42 C
ATOM 959 OH TYR A 117 26.673 14.515 62.069 1.00 37.14 O
ATOM 960 OXT TYR A 117 25.735 19.061 58.351 1.00 32.81 O
TER 961 TYR A 117
ATOM 962 N ARG B 3 42.047 55.053 18.876 1.00 34.90 N
ATOM 963 CA ARG B 3 42.680 56.307 19.383 1.00 35.03 C
ATOM 964 C ARG B 3 43.365 56.041 20.722 1.00 33.56 C
ATOM 965 O ARG B 3 42.720 55.647 21.691 1.00 33.47 O
ATOM 966 CB ARG B 3 41.614 57.395 19.562 1.00 37.48 C
ATOM 967 CG ARG B 3 40.638 57.499 18.394 1.00 41.05 C
PP Université Paris Diderot - Paris 7 50
51. Plusieurs structures
MODEL 1
ATOM 1 N GLY A 1 11.935 -10.938 0.352 1.00 0.00 N
ATOM 2 CA GLY A 1 13.344 -10.643 0.600 1.00 0.00 C
ATOM 3 C GLY A 1 13.861 -9.576 -0.330 1.00 0.00 C
ATOM 4 O GLY A 1 14.929 -9.728 -0.931 1.00 0.00 O
[...]
ATOM 934 HB2 GLU A 60 9.981 7.744 1.905 1.00 0.00 H
ATOM 935 HB3 GLU A 60 10.321 6.103 2.451 1.00 0.00 H
ATOM 936 HG2 GLU A 60 12.152 6.972 3.824 1.00 0.00 H
ATOM 937 HG3 GLU A 60 11.700 8.597 3.310 1.00 0.00 H
TER 938 GLU A 60
ENDMDL
MODEL 2
ATOM 1 N GLY A 1 19.334 -6.988 0.864 1.00 0.00 N
ATOM 2 CA GLY A 1 18.296 -6.813 1.874 1.00 0.00 C
ATOM 3 C GLY A 1 18.000 -5.370 2.142 1.00 0.00 C
ATOM 4 O GLY A 1 18.677 -4.724 2.959 1.00 0.00 O
[...]
ATOM 934 HB2 GLU A 60 11.353 9.615 -0.439 1.00 0.00 H
ATOM 935 HB3 GLU A 60 13.095 9.643 -0.204 1.00 0.00 H
ATOM 936 HG2 GLU A 60 13.380 10.930 -2.203 1.00 0.00 H
ATOM 937 HG3 GLU A 60 11.654 10.817 -2.534 1.00 0.00 H
TER 938 GLU A 60
ENDMDL
PP Université Paris Diderot - Paris 7 51
52. Des trous
[...]
ATOM 7568 CB LYS B 72 -59.462-109.221 -72.440 1.00 31.64 C
ATOM 7569 CG LYS B 72 -58.524-109.915 -73.424 1.00 31.85 C
ATOM 7570 CD LYS B 72 -58.889-109.602 -74.868 1.00 32.02 C
ATOM 7571 CE LYS B 72 -58.174-110.533 -75.837 1.00 31.61 C
ATOM 7572 NZ LYS B 72 -58.629-110.335 -77.242 1.00 31.27 N
ATOM 7573 N GLY B 73 -61.309-106.416 -72.158 1.00 31.85 N
ATOM 7574 CA GLY B 73 -62.485-105.832 -71.510 1.00 30.84 C
ATOM 7575 C GLY B 73 -63.598-106.848 -71.303 1.00 29.65 C
ATOM 7576 O GLY B 73 -64.660-106.750 -71.920 1.00 28.85 O
ATOM 7577 N SER B 74 -63.354-107.820 -70.425 1.00 28.53 N
ATOM 7578 CA SER B 74 -64.301-108.911 -70.179 1.00 27.75 C
ATOM 7579 C SER B 74 -64.180-109.438 -68.754 1.00 26.72 C
ATOM 7580 O SER B 74 -65.113-110.041 -68.227 1.00 24.48 O
ATOM 7581 CB SER B 74 -64.070-110.058 -71.166 1.00 26.32 C
ATOM 7582 OG SER B 74 -64.505-109.716 -72.470 1.00 25.54 O
ATOM 7583 N GLN B 79 -62.682-105.888 -62.336 1.00 42.85 N
ATOM 7584 CA GLN B 79 -63.246-104.902 -63.248 1.00 42.57 C
ATOM 7585 C GLN B 79 -62.146-104.278 -64.103 1.00 42.60 C
ATOM 7586 O GLN B 79 -60.992-104.191 -63.681 1.00 42.45 O
ATOM 7587 CB GLN B 79 -63.996-103.819 -62.464 1.00 42.46 C
ATOM 7588 CG GLN B 79 -64.950-102.964 -63.300 1.00 42.30 C
ATOM 7589 CD GLN B 79 -66.093-103.764 -63.905 1.00 42.15 C
ATOM 7590 OE1 GLN B 79 -66.388-104.879 -63.472 1.00 42.18 O
ATOM 7591 NE2 GLN B 79 -66.743-103.194 -64.911 1.00 41.70 N
ATOM 7592 N VAL B 80 -62.514-103.846 -65.305 1.00 42.30 N
ATOM 7593 CA VAL B 80 -61.549-103.342 -66.275 1.00 42.03 C
ATOM 7594 C VAL B 80 -60.882-102.055 -65.796 1.00 42.42 C
ATOM 7595 O VAL B 80 -61.544-101.165 -65.260 1.00 43.09 O
[...]
PP Université Paris Diderot - Paris 7 52
53. Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 53
54. Quelques précautions
restez prudents / données
PP Université Paris Diderot - Paris 7 54
55. GenBank Z71230
LOCUS Z71230 124 bp DNA linear PLN 14-NOV-2006
DEFINITION Nicotiana tabacum chloroplast JLA region, sequence 2.
ACCESSION Z71230
VERSION Z71230.1 GI:1279604
KEYWORDS rpl2 gene; transfer RNA-His; trnH gene.
SOURCE chloroplast Nicotiana tabacum (common tobacco)
ORGANISM Nicotiana tabacum
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
asterids; lamiids; Solanales; Solanaceae; Nicotianoideae;
Nicotianeae; Nicotiana.
REFERENCE 1 (bases 1 to 124)
AUTHORS Goulding,S.E., Olmstead,R.G., Morden,C.W. and Wolfe,K.H.
TITLE Ebb and flow of the chloroplast inverted repeat
JOURNAL Mol. Gen. Genet. 252 (1-2), 195-206 (1996)
PUBMED 8804393
[...]
FEATURES Location/Qualifiers
source 1..124
/organism="Nicotiana tabacum"
/organelle="plastid:chloroplast"
/mol_type="genomic DNA"
/isolate="Cuban cahibo cigar, gift from President Fidel
Castro"
/db_xref="taxon:4097"
gene <1..11
/gene="rpl2"
PP Université Paris Diderot - Paris 7 55
56. GenBank NC_001610
LOCUS NC_001610 17084 bp DNA circular MAM 14-APR-2009
DEFINITION Didelphis virginiana mitochondrion, complete genome.
ACCESSION NC_001610
VERSION NC_001610.1 GI:5835037
DBLINK Project: 11806
KEYWORDS .
SOURCE mitochondrion Didelphis virginiana (North American opossum)
ORGANISM Didelphis virginiana
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Metatheria; Didelphimorphia; Didelphidae; Didelphis.
REFERENCE 1 (bases 1 to 17084)
AUTHORS Janke,A., Feldmaier-Fuchs,G., Thomas,W.K., von Haeseler,A. and
Paabo,S.
TITLE The marsupial mitochondrial genome and the evolution of placental
mammals
JOURNAL Genetics 137 (1), 243-256 (1994)
PUBMED 8056314
[...]
FEATURES Location/Qualifiers
source 1..17084
/organism="Didelphis virginiana"
/organelle="mitochondrion"
/mol_type="genomic DNA"
/isolate="fresh road killed individual"
/db_xref="taxon:9267"
/tissue_type="liver"
/dev_stage="adult"
PP Université Paris Diderot - Paris 7 56
65. Menu
1 Rappels
2 Problématique
3 Séquences
4 Structures
5 Quelques précautions
6 Conclusion
7 Références & crédits graphiques
PP Université Paris Diderot - Paris 7 65
66. Références
Cours de J.-C. Gelly Bases de données en biologie
Bioinformatics for dummies de J.-M. Claverie et C. Notredame
BioStar
Incorrect / unusual entries in main databases (GenBank, UniProt, PDB) ?
http://biostar.stackexchange.com/questions/10869/
incorrect-unusual-entries-in-main-databases-genbank-uniprot-pdb
PP Université Paris Diderot - Paris 7 66
67. Références (2)
format FASTA – http://en.wikipedia.org/wiki/FASTA_format
GenBank – http://www.ncbi.nlm.nih.gov/
format :
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
UniProt – http://www.uniprot.org/
format : http://www.uniprot.org/manual/
PDB – http://www.rcsb.org/pdb/home/home.do
format :
http://www.wwpdb.org/documentation/format23/v2.3.html
PP Université Paris Diderot - Paris 7 67