Welcome to this bioinformatics lab on data manipulation using online and server tools.As the theme, we have chosen to study of the interaction between Frataxin and pancreatic cancer.
This is the lab template: The context is a biological context based on a real biological problem. And a given hypothesisI don’t use computer science, strong word.When you read this template, you have a different view than an informatician.You want to understand the process to build the used tools.The architecture of the systemThe algorithm implementationThe quality of the resulting dataAnd so on
Session i lab bioinfo dm and app mmc
Bioinformatics Data Manipulation:Molecular Online Tools & BioExtract ServerTheme: FXN Gene and Pancreatic Cancer.Lab #1Etienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013Etienne.firstname.lastname@example.org
Context0. Specification & Aims.Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart,spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Althoughits function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energyproduction. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreichataxia begin to experience the signs and symptoms of the disorder around puberty.Bioinformatics Molecular Online Tools and ServerKeywords:Bio: FXN, Frataxin, pancreatic cancer, CDKN4Math: HMM,Informatics: programing, bioinformatics tools, gettingand exporting dataReduced expression of frataxin isthe cause of Friedrichs ataxia(FRDA), a lethal neurodegenerativedisease, how about liver cancer?Aim: The purpose of this lab is to initiate onlinebiological exploration tools of the human model largescale data study (metabolic, proteic, genomic, …). Wesimulated the application on FXN gene and pancreaticcancer disease. Now we can understand how aresearcher can come to identify cross biologicalknowledge available in data banks.Acquired skillsOnline and server tools:- Query biological DB (fasta, Html, txt, figure formats)- Sequence tools (protein and gene)Alignment (showalign, clustalw2), similarity, …- Manage data result (select, keep, map, export)- Build and reuse workflowBiological HypothesisFXN on chromosome 9Frataxin molecule structure (pymol)Pancreatic cancerPancreasanatomy?BiologicalDBToolsResolution ProcessT2. Genome exploration:Objective: Use of Ensembl to localize the FXN on the humangenome and identify the genes implicate in pancreatic cancerdisease.T3. Sequences manipulationObjective: Find similar sequence using BLAST toolsand make an alignment on given sequences.T2.1. Locate a given gene on human genomeT2.2. Get a genomic sequence from NCBIT2.3. Get the protein data and sequence from EBIT2.4. Save the export sequences data in data folderT3.1. Find similar sequences using BLAST toolT3.2. Align generated sequences with ClustalW toolT3.3. Visualized result using phylogenic tree onJalviewT5. BioExtract serverObjective: used server tool to optimized datamanipulation process, apply on BioExtract server.T5.1. Server InitializationT5.2. Pancreatic cancer & Frataxin (FXN)T5.3. Mapping, AlignmentT5.4. Workflow save & reusedT4. Protein Data and StructuralBiology KnowledgeObjective: To provide protein levels of frataxin studyand its connection with pancreatic cancer (functional adstructural data)T1. MetabolomicsObjective: Use metabolic data repository tounderstand the frataxin protein mechanismT1.1. Finding the Enzyme and Pathway related toFrataxin using KEGGT1.2. Finding the Reaction involved with Frataxinusing ReactomeT1.3. Using BRENDA for enzyme data on FrataxinT1.4. Using Collected data for AnalysisT1.5. Redu the process with Pancreatic CancerResultsT4.1. Structural Knowledge on Frataxin usingSBKBT4.2. Using Uniprot for Frataxin Protein StudyT4.3. Protein-Protein Interaction using STRINGT4.4. Using same method for Pancreatic Cancerand compare
Data Manipulation Molecular Online Tools and BioExtract ServerT1. MetabolomicsObjective : Use metabolic data repository to understand the frataxin protein mechanismTheme: Frataxin (FXN) implication in the pancreatic cancer genesisT1.1. Finding the Enzyme and Pathway related to Frataxin using KEGGT1.2. Finding the Reaction involved with Frataxin using ReactomeT1.3. Using BRENDA to find information on FrataxinOn the Reactome website: http://www.reactome.org/ReactomeGWT/entrypoint.htmlo Search frataxin and select the 4th result with Frataxin in the title. This shows you the pathway model related to frataxinand how frataxin is involved in it.On the BRENDA Database website: http://www.brenda-enzymes.org/o Search using the E.C. number obtained in T1.1 and select the result given. This website gives multitudes of information onthe enzyme including the reaction, related species, and so on. At the very bottom of the webpage you can select otherdatabases that have infromation on the same compound or proteinOn the KEGG Database website: http://www.genome.jp/kegg/o Search frataxin, and select the first result under KEGG Gene Database (hsa:2395)o Copy the E.C. number given in “Definition” (EC:184.108.40.206)o In order to find the related pathway, search the E.C. number in the general KEGG Database search (click on the KEGGlogo on top)o Select the result given in the KEGG Enzyme Database at the bottom. Here you can see how this enzyme is involved in themetabolism given.Etienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013T1.4. Using Collected Information to Analyze the DataOn the BioModels website: http://www.ebi.ac.uk/biomodels-main/o Search using the E.C. number obtained in T1.1 and select the first result given. Here you can download the SMBL file (instudent folder) for this pathway (top left corner) and analyze it in the Sematic SBML website.http://semanticsbml.org/semanticSBML/simple/indexo Click on the first box “Find Similar Models” and click “Browse” and select the file you just saved from BioModels. In thiswebsite you can use multiple tools to analyze the model and compare with other models as well.T1.5. Same Process Searching for Pancreatic Cancer Results (Optional)o Use the same process searching instead for pancreatic cancer results.
Molecular Online Tools and BioExtract ServerT2. Genome ExplorationObjective: Use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreaticcancer disease. Next, find an appropriate data (sequence) on FASTA format.Theme: Frataxin (FXN) implication in the pancreatic cancer genesisOn the NCBI website: http://www.ncbi.nlm.nih.gov/guide/o Pull down “All Databases” and select “Gene” database, then do a keyword search using term FXNo Click the corresponding Homo-sapiens FXN gene (first result)o Scroll down and look for the “NCBI Reference Sequences” title and go to subtitle “mRNA and Proteins”o Click on the corresponding accession number of the first transcript variant (NM_000144.4)o Get the same sequence in FASTA format by clicking on “FASTA” linko Click Send on the top right in blue, select complete record, file, FASTA, and Create File – then save instudent folder if possible (will save in downloads automatically)T2.1. Locate a given gene on human genomeT2.2. Get a genomic sequence from NCBI (42 DataBases)The common protein name for FXN is FrataxinOn the EBI website: http://www.ebi.ac.uk/o Type “FXN” in the search and click on “find”o Select the Homo Sapien Frataxin to get all the information about the protein (function, domains, structure, gene expression..)o Don’t close the windowT2.3. Get the protein information and sequence from EBIOn the Ensembl web site http://uswest.ensembl.org/index.htmlo Select our species "human“o Do a keyword search using the term "FXN“o Follow the link of the “Gene” drop down featureo Click the link for “Location”o Export this gene by clicking “Export data” (left side bar) in html file as a FASTA sequence.o Click Nexto Click the “HTML” linko Do the same process by searching for “pancreatic cancer”. When you find the list of genes, select the CDKN2A geneData ManipulationEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
Data Manipulation Molecular Online Tools and BioExtract ServerT3. Sequences ManipulationObjective : Find similar sequence using BLAST tools and make alignment on given sequences.Theme: Frataxin (FXN) implication in the pancreatic cancer genesisT3.1. Find similar sequences using BLAST toolT3.2. Align generated sequences with ClustalW toolo Select about 10 different species then click on “Align” at the bottom of the screen. Selected sequences will bedirectly inserted in ClustalW tool and the tool will run automatically.o From the right menu, it is possible to select similarities, polar residues, aromatic residues, etc. if interested…o Through the same page you may add further sequences to the same alignment if needed. You can also accessthe phylogenetic tree. More details about the residues and the distances can be obtained by clicking on“Jalview” on the top right in orange. (May have to open Jalview manually)o In Jalview, click “file”, “add sequences”, “from file”, then select the sequence file you save earlier.o Continuing from Task T2.3, select the “Protein” tab on the left and select “view sequence in Uniprot”o You can get the Fasta format of the protein by clicking on “fasta” in the top righto Go back to previous page (using browser’s back button) and check the box next to the first sequence under“Sequences” title.o Select the “Blast” tool in the drop down menu then click on “Go” .o The best matched sequences will appear on the first page (green indicates a better match). To see othersequences you can click on next. Blast parameters can be modified by clicking on “Options” at the topEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
Data Manipulation Molecular Online Tools and BioExtract ServerT4. Protein Data and Structure DataObjective : To provide protein levels of frataxin study and its connection with pancreatic cancer(functional ad structural data)Theme: Frataxin (FXN) implication in the pancreatic cancer genesisT4.1. Structural Knowledge on Frataxin using SBKBT4.2. Using Uniprot for Frataxin Protein StudyT4.3. Protein-Protein Interaction using STRINGOn Uniprot Database: http://www.uniprot.org/o Search frataxin and select the first 3 results given and click “Download” in top right. You can then“Open” or “Download” any of the results givenOn the STRING Database: http://string-db.org/o Search under “search by name” “FXN”.oSelect the first result given and click “Continue”. Here you can look at the Protein-ProteinInteraction model and obtain more information on a given protein or integration by clicking on itin the model, as well as use many other useful tools.On Systems Biology Knowledgebase (SBKB): http://www.sbkb.org/o Select “by text” (options on left) and search “frataxin”.o For our example select the link next to “Structures and annotations…” Here you can obtain informationon all the different hits such a structure by looking under all the given tabs.Etienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013T4.4. Using same method for Pancreatic Cancer and compareo Go back to the STRING Database home page search under “multiple names” “frataxin” and“pancreatic cancer”. Select the first result.oSelect all three results given and click “Continue”. Here it shows the 3 proteins we haveselected, however there are no interaction shown between them in this database.o Can widen the given result by change our search for cancer in general.
o (If previous step was skipped, then this step is skipped as well) Again go to the query tab and search “FXN”. Search and select a few listings.Export them as done in T5.2 Go to the tools tab.o Select similarity search tools, then select “blastp”. Select “use records on extract page formatted as “Fasta”. Under "choose search set" select thedatabase "swissprot"o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keeponly selected records.” Again export the records.o Go to the tools tab again, select “iPlant”, then “clustal w2”. Select “use records on extract page formatted as “Fasta”. Your 10 protein sequenceswill be automatically incorporated as an input in clustalw2 tool. Execute the tool. Use the pull down for “Search Results” and select “clustalw2.fa”before viewing the results.Data Manipulation Molecular Online Tools and BioExtract ServerEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013T5. Bioextract ServerObjective : Use Workflow Management Systems (WMS) to optimized data manipulation processes (BioExtract server).Theme: Frataxin (FXN) implication in the pancreatic cancer genesisT5.4. Workflow save & reusedhttp://bioextract.orgT5.1. Server InitializationT5.2. Pancreatic cancer & Frataxin (FXN) dataT5.3. Mapping, Alignmento Register on BioExtract Server to be able to create and save your own workflows.o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.”o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click on “save records”.o Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”. Clickon “Add Seach Line” and select “Species” and type “Human”. Submit the query.o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapienFrataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results canbe saved or extracted in Fasta or txt format (Export the records in FASTA format)o Click to the "tools" tab. then click on “Alignment Tools”, and “showalign”. Select “Use records on extract page formatted in Fasta”.o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking on “view results”.o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ” (Optional)o Go back to the “workflow” tab and click “create and import workflows”. Write a name and a description for your workflow then click on Save. Allthe previous steps will be saved in this workflow.o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of the workflow to have a schematic view of it.Run the workflow by clicking on “start”.o Get and verify all the results by clicking on “provenance”. The general report can be saved for later analysis. Results of each tool can be viewed orsaved by clicking on “view file”.o The same workflow can be executed for another query by simply modifying the accession number of the protein. (Click save in the “create andimport workflows” section to temporarily save the new query)