Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lab Online Molecular Tools and BioExtract Server

654 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Lab Online Molecular Tools and BioExtract Server

  1. 1. Lab#1.Data manipulation: molecular onlineand server tools & Bioextract Server Theme: FXN gene and pancreatic cancer. Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 30 2012 Etienne.gnimpieba@usd.edu
  2. 2. Data manipulation Molecular online tools and Bioextract server Plan • Review T1. Genome exploration – Databank: Esemble – Tools: web interface, logic connector T2. Sequences manipulation – Databank: EBI, Genbank, NCBI – Tools: queries tools, Blastp, ClustalW2, Jalview, FASTA T3. Bioextract server – Data queries – Tools: Blastp, Workflow, ClustalW2, FASTA • Lab’s template Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 31 2012
  3. 3. Data manipulation Molecular online tools and Bioextract serverT1. Genome Exploration Theme: Frataxin (FXN) implication in the pancreatic cancer genesis Objective: use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreatic cancer disease. Next, find an appropriate data (sequence) on FASTA and Blast format. T1.1. Locate a given gene on human genome T1.3. Get the protein information and sequence from EBI On the Ensembl web site http://uswest.ensembl.org/index.html o Select our species "human“ The common protein name for FXN is Frataxin o Do a keyword search using the term "FXN“ o Go to EBI home page http://www.ebi.ac.uk/ o Follow the link of the gene drop down feature o Type “fxn” in the search and click on “find” o How many transcript variations of this gene are in our genome (Variations Table)? o Select the Homo sapien Frataxin to get all the information o Note the region of FXN gene by clicking location about the protein (function, domains, structure, gene o Export this gene (left side bar) in html file as a FASTA expression..) sequence. o Do the same process by searching for “pancreatic cancer”. When you find the list of genes, use the last link of the pageT1.2. Get a genomic sequence from NCBI o Go to NCBI home page http://www.ncbi.nlm.nih.gov/guide/ T1.4. Save the exported data sequences from T1.2. o Do keyword search using term FXN in data folder o Look at the gene database. How many results are there? Choose the gene database and click the corresponding Homo-sapiens FXN gene o Look for the NCBI Ref Seq to find the mRNA sequence. Click on the corresponding accession number of the first transcript variant (next to the number 1) o Get the same sequence in FASTA format by clicking on FASTA o Click Send on the top right in blue, complete record, file, FASTA, Create File – finished with file for now o Repeat the process for pancreatic cancer by searching CDKN2A Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 31 2012
  4. 4. Data manipulation Molecular online tools and Bioextract serverT2. Sequences manipulation Theme: Frataxin (FXN) implication in the pancreatic cancer genesisObjective : Find similar sequence using BLAST tools and make alignment on given sequences. T2.1. Find similar sequences using BLAST tool o Continuing from Task T1.3, select the protein tab and select “view sequence in uniprot” under the sequence category. You can get the Fasta format of the protein by clicking on “FASTA”. Go back, now check the box next to one of the sequences. Select the “Blast” tool in the drop down menu then click on “Go” . o The most matched sequences will appear on the first page (green color for the best match). To see other sequences you can click on next. Blast parameters can be modified by clicking on “options” and then Blast. T2.2. Align generated sequences with ClustalW tool o Select about 10 different species then click on “Align” at the bottom of the screen. Selected sequences will be directly inserted in ClustalW tool and the tool will run automatically. o From the right menu, it is possible to select similarities, polar residues, aromatic residues, etc. if interested… o Through the same page you may add further sequences to the same alignment if needed. You can also access the phylogenetic tree. More details about the residues and the distances can be obtained by clicking on “Jalview” on the top right in orange. Click “Keep” on the bottom left of the screen, then click the download. Check agree to the terms and conditions and “Run” T2.3. Visualized result using phylogenic tree on Jalview tool oIn Jalview, click “file”, “add sequences”, “from file”, go to downloads folder under userprofile. (unless saved in a created folder, go to that specific folder) oOpen the first sequence file from the folder contained the previous fasta files (task T1.X) oRepeat to add the second saved sequence oMake alignment and show the consensus oCalculate the tree using the “calculate”, “calculate tree”, by “average distance % identity” buttons Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 31 2012
  5. 5. Data manipulation Molecular online tools and Bioextract ServerT3. Bioextract server Theme: Frataxin (FXN) implication in the pancreatic cancer genesis Objective : use server tools to optimized data manipulation processes, apply on Bioextract server.T3.1. Server Initialization http://bioextract.org o Register on BioExtract Server to be able to create and save your own workflows. o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.” o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click on “save records”.T3.2. Pancreatic cancer & Frataxin (FXN) datao Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”, select “Species” and type “Human” by adding a search line. Submit the query.o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapien Frataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results can be saved or extracted in Fasta or txt format (Export the records in FASTA format)o Click to the "tools" tab. then click on protein tools, and tmap. Select “Use records on extract page formatted in Fasta”.o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking on “view results”.o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ” T3.3. Mapping, Alignmento Again go to the query tab and search “FXN”. Search and select a few listings. Export them as done in T3.2o Go to the tools tab. Now select similarity search tools, then select blastp. Select “use records on extract page formatted as Fasta”. Under "choose search set" select the database "swissprot"o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keep only selected records.” Again export the records.o Go to the tools tab again, select alignment tools, then clustal w2. Select “use records on extract page formatted as Fasta”. Your 10 protein sequences will be automatically incorporated as an input in clustalw2 tool. Verify that the sequence type is “protein” in the general parameters setting and then execute the tool. When execution completes verify the alignment through the “.aln” fileT3.4. Workflow save & reusedo Go back to the “workflow” tab and click “create and import workflows”. Write a name and a description for your workflow then click on Save. All the previous steps will be saved in this workflow.o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of the workflow to have a schematic view of it. Run the workflow by clicking on “start”.o Get and verify all the results by clicking on “provenance”. The general report can be saved for later analysis. Results of each tool can be viewed or saved by clicking on “view file”.o The same workflow can be executed for another query by simply modifying the accession number of the protein. (Click save in the “create and import workflows” section to temporarily save the new query) Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 31 2012
  6. 6. . Molecular online tools and server 16 Context Biological HypothesisStatement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The Reduced expression of frataxin is theprotein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble cause of Friedrichs ataxia (FRDA), aclusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic lethal neurodegenerative disease, howcondition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty. about liver cancer? 0. Specification & aims Resolution process Aim: T1. Genome exploration: The purpose of this experiment is to initiate online Objective: used of Ensembl online tools to localize the FXN on the human genome and biological exploration tools of the human genome. We identify the genes implicate in pancreatic cancer disease. After, getting an appropriate simulated the application (FXN gene and pancreatic data (sequence) on FASTA and Blast format. cancer). Now we can understand how a researcher can come to identify cross biological knowledge available T1.1. Locate a given gene on human genome in data banks. T1.2. Get a genomic sequence from NCBI Keywords: T1.3. Get the protein information and sequence from EBI Bio: FXN, Frataxin, pancreatic cancer, CDKN4 T1.4. Save the export sequences data in data folder Math: HMM, Informatics: programing, bioinformatics tools, getting T2. Sequences manipulation and exporting data Objective: Find similar sequence using BLAST tools and make an alignment on given Frataxin molecule structure (pymol) FXN on chromosome 9 sequences. T2.1. Find similar sequences using BLAST tool T2.2. Align generated sequences with ClustalW tool T1.3. Visualized result using phylogenic tree on Jalview Biological DB ? T2. Bioextract server Objective: used server tool to optimized data manipulation process, apply on Bioextract server. Tools T3.1. Server Initialization T3.2. Pancreatic cancer & Frataxin (FXN) T3.3. Mapping, Alignment Pancreas anatomy Pancreatic cancer T3.4. Workflow save & reused Acquired skills Online and server tools: - Query biological DB (fasta, Html, txt, figure formats) Conclusion: ? - Sequence tools (protein and gene) Mapping (tmap) Alignment (clustalw2) - Manage data result (select, keep, map, export) - Built and reuse workflow 16 Korean Bioinformation Center, 2010 6
  7. 7. END.

×