SlideShare a Scribd company logo
1 of 16
FBW
08-11-2016
Wim Van Criekinge
Bioinformatics.be
Recap
if condition:
statements
[elif condition:
statements] ...
else:
statements
while condition:
statements
for var in sequence:
statements
break
continue
Strings
REGULAR EXPRESSIONS
Warm Up
• Random_Sequence.py
6
Reading Files
name = open("filename")
– opens the given file for reading, and returns a file object
name.read() - file's entire contents as a string
name.readline() - next line from file as a string
name.readlines() - file's contents as a list of lines
– the lines from a file object can also be read using a for loop
>>> f = open("hours.txt")
>>> f.read()
'123 Susan 12.5 8.1 7.6 3.2n
456 Brad 4.0 11.6 6.5 2.7 12n
789 Jenn 8.0 8.0 8.0 8.0 7.5n'
7
File Input Template
• A template for reading files in Python:
name = open("filename")
for line in name:
statements
>>> input = open("hours.txt")
>>> for line in input:
... print(line.strip()) # strip() removes n
123 Susan 12.5 8.1 7.6 3.2
456 Brad 4.0 11.6 6.5 2.7 12
789 Jenn 8.0 8.0 8.0 8.0 7.5
8
Writing Files
name = open("filename", "w")
name = open("filename", "a")
– opens file for write (deletes previous contents), or
– opens file for append (new data goes after previous data)
name.write(str) - writes the given string to the file
name.close() - saves file once writing is done
>>> out = open("output.txt", "w")
>>> out.write("Hello, world!n")
>>> out.write("How are you?")
>>> out.close()
>>> open("output.txt").read()
'Hello, world!nHow are you?'
Swiss-Knife.py
• Using a database as input ! Parse
the entire Swiss Prot collection
– How many entries are there ?
– Average Protein Length (in aa and
MW)
– Relative frequency of amino acids
• Compare to the ones used to construct
the PAM scoring matrixes from 1978 –
1991
Question 3: Getting the database
Uniprot_sprot.dat.gz – 528Mb
(on Github onder Files)
Unzipped 2.92 Gb !
http://www.ebi.ac.uk/uniprot/download-center
Amino acid frequencies
1978 1991
L 0.085 0.091
A 0.087 0.077
G 0.089 0.074
S 0.070 0.069
V 0.065 0.066
E 0.050 0.062
T 0.058 0.059
K 0.081 0.059
I 0.037 0.053
D 0.047 0.052
R 0.041 0.051
P 0.051 0.051
N 0.040 0.043
Q 0.038 0.041
F 0.040 0.040
Y 0.030 0.032
M 0.015 0.024
H 0.034 0.023
C 0.033 0.020
W 0.010 0.014
Second step: Frequencies of Occurence
Extra
• Program your own prosite parser !
• Download prosite pattern database
(prosite.dat)
• Automatically generate >2000 search
patterns, and search in sequence set
from question 1
Oefening 1
1. Which of following 4 sequences
(seq1/2/3/4)
a) contains a “Galactokinase signature”
b) How many of them?
http://us.expasy.org/prosite/
>SEQ1
MGNLFENCTHRYSFEYIYENCTNTTNQCGLIRNVASSIDVFHWLDVYISTTIFVISGILNFYCLFIALYT
YYFLDNETRKHYVFVLSRFLSSILVIISLLVLESTLFSESLSPTFAYYAVAFSIYDFSMDTLFFSYIMIS
LITYFGVVHYNFYRRHVSLRSLYIILISMWTFSLAIAIPLGLYEAASNSQGPIKCDLSYCGKVVEWITCS
LQGCDSFYNANELLVQSIISSVETLVGSLVFLTDPLINIFFDKNISKMVKLQLTLGKWFIALYRFLFQMT
NIFENCSTHYSFEKNLQKCVNASNPCQLLQKMNTAHSLMIWMGFYIPSAMCFLAVLVDTYCLLVTISILK
SLKKQSRKQYIFGRANIIGEHNDYVVVRLSAAILIALCIIIIQSTYFIDIPFRDTFAFFAVLFIIYDFSILSLLGSFTGVA
M MTYFGVMRPLVYRDKFTLKTIYIIAFAIVLFSVCVAIPFGLFQAADEIDGPIKCDSESCELIVKWLLFCI
ACLILMGCTGTLLFVTVSLHWHSYKSKKMGNVSSSAFNHGKSRLTWTTTILVILCCVELIPTGLLAAFGK
SESISDDCYDFYNANSLIFPAIVSSLETFLGSITFLLDPIINFSFDKRISKVFSSQVSMFSIFFCGKR
>SEQ2
MLDDRARMEA AKKEKVEQIL AEFQLQEEDL KKVMRRMQKE MDRGLRLETH EEASVKMLPT YVRSTPEGSE
VGDFLSLDLG GTNFRVMLVK VGEGEEGQWS VKTKHQMYSI PEDAMTGTAE MLFDYISECI SDFLDKHQMK
HKKLPLGFTF SFPVRHEDID KGILLNWTKG FKASGAEGNN VVGLLRDAIK RRGDFEMDVV AMVNDTVATM
ISCYYEDHQC EVGMIVGTGC NACYMEEMQN VELVEGDEGR MCVNTEWGAF GDSGELDEFL LEYDRLVDES
SANPGQQLYE KLIGGKYMGE LVRLVLLRLV DENLLFHGEA SEQLRTRGAF ETRFVSQVES DTGDRKQIYN
ILSTLGLRPS TTDCDIVRRA CESVSTRAAH MCSAGLAGVI NRMRESRSED VMRITVGVDG SVYKLHPSFK
ERFHASVRRL TPSCEITFIE SEEGSGRGAA LVSAVACKKA CMLGQ
>SEQ3
MESDSFEDFLKGEDFSNYSYSSDLPPFLLDAAPCEPESLEINKYFVVIIYVLVFLLSLLGNSLVMLVILY
SRVGRSGRDNVIGDHVDYVTDVYLLNLALADLLFALTLPIWAASKVTGWIFGTFLCKVVSLLKEVNFYSGILLLA
CISVDRY
LAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLIFRKTIYPPYVSPVCYEDMGNNTANWRMLLRILP
QSFGFIVPLLIMLFCYGFTLRTLFKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTWVIQET
CERRNDIDRALEATEILGILGRVNLIGEHWDYHSCLNPLIYAFIGQKFRHGLLKILAIHGLISKDSLPKDSRPSFVGS
SSGH TSTTL
>SEQ4
MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG
GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG
FTFSYPANQV SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK
ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ
IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR
FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI
GDNEKLISIG IAKDGSGIGA ALCALQAVKE KKGLA MEANFQQAVK KLVNDFEYPT ESLREAVKEF
DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC
VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW TKGLNIPEAI
NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG
KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE
GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL WDAAHAIGRR AARMSAVPIA
SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA ALCALQAVKE
KKGLA
Oefening 1
• Prosite_Scanner.py
Extra Questions
• How many records have a sequence of length 260?
• What are the first 20 residues of 143X_MAIZE?
• What is the identifier for the record with the
shortest sequence? Is there more than one record
with that length?
• What is the identifier for the record with the
longest sequence? Is there more than one record
with that length?
• How many contain the subsequence "ARRA"?
• How many contain the substring "KCIP-1" in the
description?

More Related Content

What's hot

Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Mark Wong
 

What's hot (20)

2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge2015 bioinformatics python_strings_wim_vancriekinge
2015 bioinformatics python_strings_wim_vancriekinge
 
What is the best full text search engine for Python?
What is the best full text search engine for Python?What is the best full text search engine for Python?
What is the best full text search engine for Python?
 
Embedded R Execution using SQL
Embedded R Execution using SQLEmbedded R Execution using SQL
Embedded R Execution using SQL
 
Bioinformatics t2-databases wim-vancriekinge_v2013
Bioinformatics t2-databases wim-vancriekinge_v2013Bioinformatics t2-databases wim-vancriekinge_v2013
Bioinformatics t2-databases wim-vancriekinge_v2013
 
2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge2016 bioinformatics i_bio_python_ii_wimvancriekinge
2016 bioinformatics i_bio_python_ii_wimvancriekinge
 
NoSQL Couchbase Lite & BigData HPCC Systems
NoSQL Couchbase Lite & BigData HPCC SystemsNoSQL Couchbase Lite & BigData HPCC Systems
NoSQL Couchbase Lite & BigData HPCC Systems
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGCassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUG
 
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC SystemsBig Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load CSV File & Query the EZ way - HPCC Systems
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Bio2 Rdf Presentation V3
Bio2 Rdf Presentation V3Bio2 Rdf Presentation V3
Bio2 Rdf Presentation V3
 
HPCC Systems vs Hadoop
HPCC Systems vs HadoopHPCC Systems vs Hadoop
HPCC Systems vs Hadoop
 
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC SystemsBig Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
 
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC SystemsBig Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC Systems
 
Hadoop Essential for Oracle Professionals
Hadoop Essential for Oracle ProfessionalsHadoop Essential for Oracle Professionals
Hadoop Essential for Oracle Professionals
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
Evaluation of Data Reliability on Linux File Systems
Evaluation of Data Reliability on Linux File SystemsEvaluation of Data Reliability on Linux File Systems
Evaluation of Data Reliability on Linux File Systems
 

Viewers also liked

Viewers also liked (17)

2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
 
2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge2016 bioinformatics i_phylogenetics_wim_vancriekinge
2016 bioinformatics i_phylogenetics_wim_vancriekinge
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
2017 biological databasespart2
2017 biological databasespart22017 biological databasespart2
2017 biological databasespart2
 
Mysql introduction
Mysql introduction Mysql introduction
Mysql introduction
 
2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge2016 bioinformatics i_python_part_1_wim_vancriekinge
2016 bioinformatics i_python_part_1_wim_vancriekinge
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
 
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
2016 bioinformatics i_python_part_3_io_and_strings_wim_vancriekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
Introducing SMCR from an HR perspective
Introducing SMCR from an HR perspectiveIntroducing SMCR from an HR perspective
Introducing SMCR from an HR perspective
 
2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload2016 bioinformatics i_wim_vancriekinge_vupload
2016 bioinformatics i_wim_vancriekinge_vupload
 
نمونه آثار آقای علی عابدینی
نمونه آثار آقای علی عابدینینمونه آثار آقای علی عابدینی
نمونه آثار آقای علی عابدینی
 
بازدید از شهرک علمی تحقیقاتی
بازدید از شهرک علمی تحقیقاتیبازدید از شهرک علمی تحقیقاتی
بازدید از شهرک علمی تحقیقاتی
 
Beyond the Gig Economy
Beyond the Gig EconomyBeyond the Gig Economy
Beyond the Gig Economy
 
Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020
 
3 hard facts shaping higher education thinking and behavior
3 hard facts shaping higher education thinking and behavior3 hard facts shaping higher education thinking and behavior
3 hard facts shaping higher education thinking and behavior
 

Similar to 2016 bioinformatics i_io_wim_vancriekinge

Kyle Hailey Oracle Performance IO Waits
Kyle Hailey  Oracle Performance IO WaitsKyle Hailey  Oracle Performance IO Waits
Kyle Hailey Oracle Performance IO Waits
cookie1969
 
Oracle forensics 101
Oracle forensics 101Oracle forensics 101
Oracle forensics 101
fangjiafu
 

Similar to 2016 bioinformatics i_io_wim_vancriekinge (20)

2015 bioinformatics bio_python
2015 bioinformatics bio_python2015 bioinformatics bio_python
2015 bioinformatics bio_python
 
Systemcall1
Systemcall1Systemcall1
Systemcall1
 
Kyle Hailey Oracle Performance IO Waits
Kyle Hailey  Oracle Performance IO WaitsKyle Hailey  Oracle Performance IO Waits
Kyle Hailey Oracle Performance IO Waits
 
SANS Windows Artifact Analysis 2012
SANS Windows Artifact Analysis 2012SANS Windows Artifact Analysis 2012
SANS Windows Artifact Analysis 2012
 
Oracle Tracing
Oracle TracingOracle Tracing
Oracle Tracing
 
Db forensics for sql rally
Db forensics for sql rallyDb forensics for sql rally
Db forensics for sql rally
 
Biopython
BiopythonBiopython
Biopython
 
Oracle forensics 101
Oracle forensics 101Oracle forensics 101
Oracle forensics 101
 
Linux intro 4 awk + makefile
Linux intro 4  awk + makefileLinux intro 4  awk + makefile
Linux intro 4 awk + makefile
 
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189
 
Introduction to UNIX
Introduction to UNIXIntroduction to UNIX
Introduction to UNIX
 
Java 7 Features and Enhancements
Java 7 Features and EnhancementsJava 7 Features and Enhancements
Java 7 Features and Enhancements
 
Uop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet newUop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet new
 
Uop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet newUop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet new
 
Uop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet newUop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet new
 
Uop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet newUop pos 433 week 4 file processing commands worksheet new
Uop pos 433 week 4 file processing commands worksheet new
 
Bc0056 unix operating system
Bc0056   unix operating systemBc0056   unix operating system
Bc0056 unix operating system
 
#WeSpeakLinux Session
#WeSpeakLinux Session#WeSpeakLinux Session
#WeSpeakLinux Session
 
JProfiler8 @ OVIRT
JProfiler8 @ OVIRTJProfiler8 @ OVIRT
JProfiler8 @ OVIRT
 
06-files.ppt
06-files.ppt06-files.ppt
06-files.ppt
 

More from Prof. Wim Van Criekinge

More from Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 
Van criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotechVan criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotech
 

Recently uploaded

Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Recently uploaded (20)

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

2016 bioinformatics i_io_wim_vancriekinge

  • 1.
  • 4. Recap if condition: statements [elif condition: statements] ... else: statements while condition: statements for var in sequence: statements break continue Strings REGULAR EXPRESSIONS
  • 6. 6 Reading Files name = open("filename") – opens the given file for reading, and returns a file object name.read() - file's entire contents as a string name.readline() - next line from file as a string name.readlines() - file's contents as a list of lines – the lines from a file object can also be read using a for loop >>> f = open("hours.txt") >>> f.read() '123 Susan 12.5 8.1 7.6 3.2n 456 Brad 4.0 11.6 6.5 2.7 12n 789 Jenn 8.0 8.0 8.0 8.0 7.5n'
  • 7. 7 File Input Template • A template for reading files in Python: name = open("filename") for line in name: statements >>> input = open("hours.txt") >>> for line in input: ... print(line.strip()) # strip() removes n 123 Susan 12.5 8.1 7.6 3.2 456 Brad 4.0 11.6 6.5 2.7 12 789 Jenn 8.0 8.0 8.0 8.0 7.5
  • 8. 8 Writing Files name = open("filename", "w") name = open("filename", "a") – opens file for write (deletes previous contents), or – opens file for append (new data goes after previous data) name.write(str) - writes the given string to the file name.close() - saves file once writing is done >>> out = open("output.txt", "w") >>> out.write("Hello, world!n") >>> out.write("How are you?") >>> out.close() >>> open("output.txt").read() 'Hello, world!nHow are you?'
  • 9. Swiss-Knife.py • Using a database as input ! Parse the entire Swiss Prot collection – How many entries are there ? – Average Protein Length (in aa and MW) – Relative frequency of amino acids • Compare to the ones used to construct the PAM scoring matrixes from 1978 – 1991
  • 10. Question 3: Getting the database Uniprot_sprot.dat.gz – 528Mb (on Github onder Files) Unzipped 2.92 Gb ! http://www.ebi.ac.uk/uniprot/download-center
  • 11. Amino acid frequencies 1978 1991 L 0.085 0.091 A 0.087 0.077 G 0.089 0.074 S 0.070 0.069 V 0.065 0.066 E 0.050 0.062 T 0.058 0.059 K 0.081 0.059 I 0.037 0.053 D 0.047 0.052 R 0.041 0.051 P 0.051 0.051 N 0.040 0.043 Q 0.038 0.041 F 0.040 0.040 Y 0.030 0.032 M 0.015 0.024 H 0.034 0.023 C 0.033 0.020 W 0.010 0.014 Second step: Frequencies of Occurence
  • 12. Extra • Program your own prosite parser ! • Download prosite pattern database (prosite.dat) • Automatically generate >2000 search patterns, and search in sequence set from question 1
  • 13. Oefening 1 1. Which of following 4 sequences (seq1/2/3/4) a) contains a “Galactokinase signature” b) How many of them? http://us.expasy.org/prosite/
  • 14. >SEQ1 MGNLFENCTHRYSFEYIYENCTNTTNQCGLIRNVASSIDVFHWLDVYISTTIFVISGILNFYCLFIALYT YYFLDNETRKHYVFVLSRFLSSILVIISLLVLESTLFSESLSPTFAYYAVAFSIYDFSMDTLFFSYIMIS LITYFGVVHYNFYRRHVSLRSLYIILISMWTFSLAIAIPLGLYEAASNSQGPIKCDLSYCGKVVEWITCS LQGCDSFYNANELLVQSIISSVETLVGSLVFLTDPLINIFFDKNISKMVKLQLTLGKWFIALYRFLFQMT NIFENCSTHYSFEKNLQKCVNASNPCQLLQKMNTAHSLMIWMGFYIPSAMCFLAVLVDTYCLLVTISILK SLKKQSRKQYIFGRANIIGEHNDYVVVRLSAAILIALCIIIIQSTYFIDIPFRDTFAFFAVLFIIYDFSILSLLGSFTGVA M MTYFGVMRPLVYRDKFTLKTIYIIAFAIVLFSVCVAIPFGLFQAADEIDGPIKCDSESCELIVKWLLFCI ACLILMGCTGTLLFVTVSLHWHSYKSKKMGNVSSSAFNHGKSRLTWTTTILVILCCVELIPTGLLAAFGK SESISDDCYDFYNANSLIFPAIVSSLETFLGSITFLLDPIINFSFDKRISKVFSSQVSMFSIFFCGKR >SEQ2 MLDDRARMEA AKKEKVEQIL AEFQLQEEDL KKVMRRMQKE MDRGLRLETH EEASVKMLPT YVRSTPEGSE VGDFLSLDLG GTNFRVMLVK VGEGEEGQWS VKTKHQMYSI PEDAMTGTAE MLFDYISECI SDFLDKHQMK HKKLPLGFTF SFPVRHEDID KGILLNWTKG FKASGAEGNN VVGLLRDAIK RRGDFEMDVV AMVNDTVATM ISCYYEDHQC EVGMIVGTGC NACYMEEMQN VELVEGDEGR MCVNTEWGAF GDSGELDEFL LEYDRLVDES SANPGQQLYE KLIGGKYMGE LVRLVLLRLV DENLLFHGEA SEQLRTRGAF ETRFVSQVES DTGDRKQIYN ILSTLGLRPS TTDCDIVRRA CESVSTRAAH MCSAGLAGVI NRMRESRSED VMRITVGVDG SVYKLHPSFK ERFHASVRRL TPSCEITFIE SEEGSGRGAA LVSAVACKKA CMLGQ >SEQ3 MESDSFEDFLKGEDFSNYSYSSDLPPFLLDAAPCEPESLEINKYFVVIIYVLVFLLSLLGNSLVMLVILY SRVGRSGRDNVIGDHVDYVTDVYLLNLALADLLFALTLPIWAASKVTGWIFGTFLCKVVSLLKEVNFYSGILLLA CISVDRY LAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLIFRKTIYPPYVSPVCYEDMGNNTANWRMLLRILP QSFGFIVPLLIMLFCYGFTLRTLFKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTWVIQET CERRNDIDRALEATEILGILGRVNLIGEHWDYHSCLNPLIYAFIGQKFRHGLLKILAIHGLISKDSLPKDSRPSFVGS SSGH TSTTL >SEQ4 MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA ALCALQAVKE KKGLA MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA ALCALQAVKE KKGLA Oefening 1
  • 16. Extra Questions • How many records have a sequence of length 260? • What are the first 20 residues of 143X_MAIZE? • What is the identifier for the record with the shortest sequence? Is there more than one record with that length? • What is the identifier for the record with the longest sequence? Is there more than one record with that length? • How many contain the subsequence "ARRA"? • How many contain the substring "KCIP-1" in the description?