FXN gene provides instructions for frataxin protein found in mitochondria and involved in energy production. Mutations cause Friedreich ataxia. The document summarizes using online tools to explore the relationship between reduced frataxin expression and pancreatic cancer. Key steps included locating FXN on the human genome, obtaining sequences from databases, finding similar sequences using BLAST, aligning sequences with ClustalW, and managing workflows on the Bioextract server. The process aimed to understand how researchers identify biological knowledge across databases using online tools.
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Lab Online Molecular Tools and BioExtract Server
1. Lab#1.
Data manipulation: molecular online
and server tools & Bioextract Server
Theme: FXN gene and pancreatic cancer.
Etienne Z. Gnimpieba
BRIN WS 2012
Sioux Falls, May 30 2012
Etienne.gnimpieba@usd.edu
2. Data manipulation Molecular online tools and Bioextract server
Plan
• Review
T1. Genome exploration
– Databank: Esemble
– Tools: web interface, logic connector
T2. Sequences manipulation
– Databank: EBI, Genbank, NCBI
– Tools: queries tools, Blastp, ClustalW2, Jalview, FASTA
T3. Bioextract server
– Data queries
– Tools: Blastp, Workflow, ClustalW2, FASTA
• Lab’s template
Etienne Z. Gnimpieba
BRIN WS 2012
Sioux Falls, May 31 2012
3. Data manipulation Molecular online tools and Bioextract server
T1. Genome Exploration Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
Objective: use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreatic
cancer disease. Next, find an appropriate data (sequence) on FASTA and Blast format.
T1.1. Locate a given gene on human genome T1.3. Get the protein information and sequence from EBI
On the Ensembl web site http://uswest.ensembl.org/index.html
o Select our species "human“ The common protein name for FXN is Frataxin
o Do a keyword search using the term "FXN“ o Go to EBI home page http://www.ebi.ac.uk/
o Follow the link of the gene drop down feature
o Type “fxn” in the search and click on “find”
o How many transcript variations of this gene are in our
genome (Variations Table)?
o Select the Homo sapien Frataxin to get all the information
o Note the region of FXN gene by clicking location about the protein (function, domains, structure, gene
o Export this gene (left side bar) in html file as a FASTA expression..)
sequence.
o Do the same process by searching for “pancreatic cancer”.
When you find the list of genes, use the last link of the page
T1.2. Get a genomic sequence from NCBI
o Go to NCBI home page http://www.ncbi.nlm.nih.gov/guide/ T1.4. Save the exported data sequences from T1.2.
o Do keyword search using term FXN in data folder
o Look at the gene database. How many results are there?
Choose the gene database and click the corresponding
Homo-sapiens FXN gene
o Look for the NCBI Ref Seq to find the mRNA sequence.
Click on the corresponding accession number of the first
transcript variant (next to the number 1)
o Get the same sequence in FASTA format by clicking on
FASTA
o Click Send on the top right in blue, complete record,
file, FASTA, Create File – finished with file for now
o Repeat the process for pancreatic cancer by searching
CDKN2A
Etienne Z. Gnimpieba
BRIN WS 2012
Sioux Falls, May 31 2012
4. Data manipulation Molecular online tools and Bioextract server
T2. Sequences manipulation Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
Objective : Find similar sequence using BLAST tools and make alignment on given sequences.
T2.1. Find similar sequences using BLAST tool
o Continuing from Task T1.3, select the protein tab and select “view sequence in uniprot” under the
sequence category. You can get the Fasta format of the protein by clicking on “FASTA”. Go back, now
check the box next to one of the sequences. Select the “Blast” tool in the drop down menu then click on
“Go” .
o The most matched sequences will appear on the first page (green color for the best match). To see
other sequences you can click on next. Blast parameters can be modified by clicking on “options” and
then Blast.
T2.2. Align generated sequences with ClustalW tool
o Select about 10 different species then click on “Align” at the bottom of the screen. Selected sequences
will be directly inserted in ClustalW tool and the tool will run automatically.
o From the right menu, it is possible to select similarities, polar residues, aromatic residues, etc. if
interested…
o Through the same page you may add further sequences to the same alignment if needed. You can also
access the phylogenetic tree. More details about the residues and the distances can be obtained by
clicking on “Jalview” on the top right in orange. Click “Keep” on the bottom left of the screen, then click
the download. Check agree to the terms and conditions and “Run”
T2.3. Visualized result using phylogenic tree on Jalview tool
oIn Jalview, click “file”, “add sequences”, “from file”, go to downloads folder under userprofile.
(unless saved in a created folder, go to that specific folder)
oOpen the first sequence file from the folder contained the previous fasta files (task T1.X)
oRepeat to add the second saved sequence
oMake alignment and show the consensus
oCalculate the tree using the “calculate”, “calculate tree”, by “average distance % identity” buttons
Etienne Z. Gnimpieba
BRIN WS 2012
Sioux Falls, May 31 2012
5. Data manipulation Molecular online tools and Bioextract Server
T3. Bioextract server Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
Objective : use server tools to optimized data manipulation processes, apply on Bioextract server.
T3.1. Server Initialization http://bioextract.org
o Register on BioExtract Server to be able to create and save your own workflows.
o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.”
o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click on “save
records”.
T3.2. Pancreatic cancer & Frataxin (FXN) data
o Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”,
select “Species” and type “Human” by adding a search line. Submit the query.
o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapien
Frataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results can
be saved or extracted in Fasta or txt format (Export the records in FASTA format)
o Click to the "tools" tab. then click on protein tools, and tmap. Select “Use records on extract page formatted in Fasta”.
o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking on “view results”.
o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ”
T3.3. Mapping, Alignment
o Again go to the query tab and search “FXN”. Search and select a few listings. Export them as done in T3.2
o Go to the tools tab. Now select similarity search tools, then select blastp. Select “use records on extract page formatted as Fasta”. Under "choose
search set" select the database "swissprot"
o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keep
only selected records.” Again export the records.
o Go to the tools tab again, select alignment tools, then clustal w2. Select “use records on extract page formatted as Fasta”. Your 10 protein
sequences will be automatically incorporated as an input in clustalw2 tool. Verify that the sequence type is “protein” in the general parameters
setting and then execute the tool. When execution completes verify the alignment through the “.aln” file
T3.4. Workflow save & reused
o Go back to the “workflow” tab and click “create and import workflows”. Write a name and a
description for your workflow then click on Save. All the previous steps will be saved in this
workflow.
o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of
the workflow to have a schematic view of it. Run the workflow by clicking on “start”.
o Get and verify all the results by clicking on “provenance”. The general report can be saved for later
analysis. Results of each tool can be viewed or saved by clicking on “view file”.
o The same workflow can be executed for another query by simply modifying the accession number of
the protein. (Click save in the “create and import workflows” section to temporarily save the new
query)
Etienne Z. Gnimpieba
BRIN WS 2012
Sioux Falls, May 31 2012
6. . Molecular online tools and server 16
Context Biological Hypothesis
Statement of problem / Case study:
The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The Reduced expression of frataxin is the
protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble cause of Friedrich's ataxia (FRDA), a
clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic lethal neurodegenerative disease, how
condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty. about liver cancer?
0. Specification & aims Resolution process
Aim: T1. Genome exploration:
The purpose of this experiment is to initiate online Objective: used of Ensembl online tools to localize the FXN on the human genome and
biological exploration tools of the human genome. We identify the genes implicate in pancreatic cancer disease. After, getting an appropriate
simulated the application (FXN gene and pancreatic data (sequence) on FASTA and Blast format.
cancer). Now we can understand how a researcher can
come to identify cross biological knowledge available T1.1. Locate a given gene on human genome
in data banks. T1.2. Get a genomic sequence from NCBI
Keywords: T1.3. Get the protein information and sequence from EBI
Bio: FXN, Frataxin, pancreatic cancer, CDKN4 T1.4. Save the export sequences data in data folder
Math: HMM,
Informatics: programing, bioinformatics tools, getting T2. Sequences manipulation
and exporting data
Objective: Find similar sequence using BLAST tools and make an alignment on given
Frataxin molecule structure (pymol)
FXN on chromosome 9 sequences.
T2.1. Find similar sequences using BLAST tool
T2.2. Align generated sequences with ClustalW tool
T1.3. Visualized result using phylogenic tree on Jalview
Biological DB
?
T2. Bioextract server
Objective: used server tool to optimized data manipulation process, apply on Bioextract server.
Tools
T3.1. Server Initialization
T3.2. Pancreatic cancer & Frataxin (FXN)
T3.3. Mapping, Alignment
Pancreas anatomy Pancreatic cancer T3.4. Workflow save & reused
Acquired skills
Online and server tools:
- Query biological DB (fasta, Html, txt, figure formats) Conclusion: ?
- Sequence tools (protein and gene)
Mapping (tmap)
Alignment (clustalw2)
- Manage data result (select, keep, map, export)
- Built and reuse workflow
16 Korean Bioinformation Center, 2010 6
Welcome to this bioinformatics lab on data manipulation using online and server tools.As the theme, we have chosen to study of the interaction between Frataxin and pancreatic cancer.
During this lab, we have:A brief review Lab’s templateGenome exploration practice…
This is the lab template: The context is a biological context based on a real biological problem. And a given hypothesisI don’t use computer science, strong word.When you read this template, you have a different view than an informatician.You want to understand the process to build the used tools.The architecture of the systemThe algorithm implementationThe quality of the resulting dataAnd so on