The document discusses using various online bioinformatics tools and databases to analyze the relationship between the FXN gene, its protein product frataxin, and pancreatic cancer. It outlines tasks using tools like Ensembl, NCBI, EBI, KEGG, Reactome, and BRENDA to locate the FXN gene, obtain sequences, identify metabolic pathways and enzymes, and explore protein structure and interactions. The goal is to understand how frataxin may be implicated in pancreatic cancer development at the genomic, protein, and pathway levels.
Bioinformatics is the application of Information technology to store, organize and analyze the vast amount of biological data which is available in the form of sequences and structures of proteins and nucleic acids. The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures.
A biological database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. The activity of preparing a database can be divided in to:
Collection of data in a form which can be easily accessed
Making it available to a multi-user system (always available for the user)
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities—their phylogeny—based upon similarities and differences in their physical or genetic characteristics.
Course: Bioinformatics for Biomedical Research (2014).
Session: 1.2- Storing and Accessing Information. Databases and Queries.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
An introduction on gene annotation & curation for the IAGC and BIPAA research communities.
Bioinformatics: Introduction, Objective of Bioinformatics, Bioinformatics Databases, Concept of Bioinformatics, Impact of Bioinformatics in Vaccine Discovery
Bioinformatics is the application of Information technology to store, organize and analyze the vast amount of biological data which is available in the form of sequences and structures of proteins and nucleic acids. The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures.
A biological database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. The activity of preparing a database can be divided in to:
Collection of data in a form which can be easily accessed
Making it available to a multi-user system (always available for the user)
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities—their phylogeny—based upon similarities and differences in their physical or genetic characteristics.
Course: Bioinformatics for Biomedical Research (2014).
Session: 1.2- Storing and Accessing Information. Databases and Queries.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
A half day course presented during the Earlham Institute summer school on bioinformatics 2016, in Norwich, UK, http://www.earlham.ac.uk/earlham-institute-summer-school-bioinformatics
Apollo is a web-based application that supports and enables collaborative genome curation in real time, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Apollo allows researchers to break down large amounts of data into manageable portions to mobilize groups of researchers with shared interests.
An introduction on gene annotation & curation for the IAGC and BIPAA research communities.
Bioinformatics: Introduction, Objective of Bioinformatics, Bioinformatics Databases, Concept of Bioinformatics, Impact of Bioinformatics in Vaccine Discovery
Introduction to Gene Mining Part A: BLASTn-off!adcobb
In this lesson, students will learn to use bioinformatics portals and tools to mine plant versions of human genes. Student handout and teacher resource materials are available at www.Araport.org, Teaching Resources (Community tab). Suitable for grades 9-12 or first year undergraduate students.
This presentation was created by Ioanna Leontiou and it is intended as a creative and flexible tool for students on Biological sciences who focus on the chromosome segregation. It is created to facilitate students performing research projects in our lab (especially during Covid restrictions), but it is suitable for every student who wants to learn more about chromosomes and the molecular mechanism controlling chromosome segregation. The presentation includes a generic overview of the cell division, illustrates the chromosome structure and provides molecular details of the spindle assembly checkpoint, an important pathway that ensures high fedility of chromosome segregation through mitosis. It also includes an introduction to some of the molecular biology techniques used in a yeast lab and incoporates some fluorescent microscopy images/videos. At the end of the presentantion there is a list of open access scientific publications for further reading on the the molecular mechanism of spindle checkpoint and some links of some very interesting sites, which include a range of videos on laboratory molecular biology techniques, research talks and guided papers. The purpose of this presentantion is to create a piece of work that students could return to when needed. Diagramms and illustrations are also encouranged to be used by scientists, science communicators and educators.
This presentation is licensed under a Creative Common Attribution-ShareAlike 4.0 (CC BY-SA 4.0), unless otherwise stated on the specific slide.
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
Bioinformatics for beginners (exam point of view)Sijo A
. The term bioinformatics is coined by…………………………….
Paulien Hogeweg
2. What is an entry in database?
The process of entering data into a computerised database or spreadsheet.
3. Define BLASTp
BLAST- Basic Local Alignment Search Tool
It is a homology and similarity search tool.
It is provided by NCBI.
It is used to compare a query DNA sequence with a database of sequences.
4. What is Ecogenes?
Ecogene is a database and website and it is developed to improve structural and functional annotation of E.coli K-12 MG 1655.
TaskDifferentiate the following terms and provide an image obtain.docxjosies1
Task
Differentiate the following terms and provide an image obtained from a book or the Internet which clearly illustrates this term (reference your source).
chromatin
sister chromatids
chromosome
Differentiate between the process of cytokinesis in plant and animal cells. Provide an image from a text or the Internet which clearly illustrates each of these processes (reference your sources).
It was stated in the content that the mechanism by which the sister chromatids are drawn away from each other towards opposite poles of the cells by the spindle fibres, is unknown. Do some research to detemine how some cell biologists believe this occurs. Provide references.
Go to the next page to submit your assignment.
How would a pair of sister chromatids differ from homologous chromosomes?
If the reproductive cell only has 23 chromosomes, then how come all of our body cells (somatic cells) have 46 chromosomes?
Briefly describe the constituent molecules found within a chromosome. You may use a diagram to illustrate your answer. Reference your sources.
What is a gene? What is a genome? What does a gene do?
How many genes (approximately) does the human genome contain?
Describe the labratory technique of decanting.
What happened to Rosalind Franklin with respect to her career following the discovery of the chemical nature of DNA?
Describe the structure of DNA in detail. Provide an additional image or images which clearly illustrate this structure. Reference your source.
With your knowledge of the chemical nature of DNA, briefly summarize how DNA replicates itself during interphase of the cell cycle.
What is the mechanism by which prokaryotic cells can adjust to their changing environments?
Name one advantage of this mechanism.
Predict what might happen if there was a mutation on the gene that coded for the synthesis of the repressor protein.
Identify the three components of an operon.
What turns the switch “off” on the lac operon?
What happens to the repressor protein to change its conformity and how does this affect the lac operon?
What do you think the term "regulatory protein" means?
What is the advantage of having one promoter to translate five genes?
What might happen if there was a mutation on the gene that coded for the synthesis of the repressor protein?
Fill in the following chart properly by picking the correct answers:
trp Repressor
Tryptophan Levels
Repressor Bound to Operator
Result
inactive or active? low or high? yes or no? Tryptophan made
inactive or active? low or high? yes or no? Tryptophan is NOT made
How are the lac operon and the trp operon similar in terms of repressor proteins? How are they different in terms of gene regulation?
Name two methods scientists use to clone genetic material.
What are the advantages and disadvantages of the PCR method?
What would happen if you forgot to add a primer to the PCR me.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Session i lab bioinfo dm and app mmc
1. Bioinformatics Data Manipulation:
Molecular Online Tools & BioExtract Server
Theme: FXN Gene and Pancreatic Cancer.
Lab #1
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
Etienne.gnimpieba@usd.edu
2. Context
0. Specification & Aims
.
Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart,
spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although
its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy
production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich
ataxia begin to experience the signs and symptoms of the disorder around puberty.
Bioinformatics Molecular Online Tools and Server
Keywords:
Bio: FXN, Frataxin, pancreatic cancer, CDKN4
Math: HMM,
Informatics: programing, bioinformatics tools, getting
and exporting data
Reduced expression of frataxin is
the cause of Friedrich's ataxia
(FRDA), a lethal neurodegenerative
disease, how about liver cancer?
Aim: The purpose of this lab is to initiate online
biological exploration tools of the human model large
scale data study (metabolic, proteic, genomic, …). We
simulated the application on FXN gene and pancreatic
cancer disease. Now we can understand how a
researcher can come to identify cross biological
knowledge available in data banks.
Acquired skills
Online and server tools:
- Query biological DB (fasta, Html, txt, figure formats)
- Sequence tools (protein and gene)
Alignment (showalign, clustalw2), similarity, …
- Manage data result (select, keep, map, export)
- Build and reuse workflow
Biological Hypothesis
FXN on chromosome 9
Frataxin molecule structure (pymol)
Pancreatic cancerPancreasanatomy
?
BiologicalDB
Tools
Resolution Process
T2. Genome exploration:
Objective: Use of Ensembl to localize the FXN on the human
genome and identify the genes implicate in pancreatic cancer
disease.
T3. Sequences manipulation
Objective: Find similar sequence using BLAST tools
and make an alignment on given sequences.
T2.1. Locate a given gene on human genome
T2.2. Get a genomic sequence from NCBI
T2.3. Get the protein data and sequence from EBI
T2.4. Save the export sequences data in data folder
T3.1. Find similar sequences using BLAST tool
T3.2. Align generated sequences with ClustalW tool
T3.3. Visualized result using phylogenic tree on
Jalview
T5. BioExtract server
Objective: used server tool to optimized data
manipulation process, apply on BioExtract server.
T5.1. Server Initialization
T5.2. Pancreatic cancer & Frataxin (FXN)
T5.3. Mapping, Alignment
T5.4. Workflow save & reused
T4. Protein Data and Structural
Biology Knowledge
Objective: To provide protein levels of frataxin study
and its connection with pancreatic cancer (functional ad
structural data)
T1. Metabolomics
Objective: Use metabolic data repository to
understand the frataxin protein mechanism
T1.1. Finding the Enzyme and Pathway related to
Frataxin using KEGG
T1.2. Finding the Reaction involved with Frataxin
using Reactome
T1.3. Using BRENDA for enzyme data on Frataxin
T1.4. Using Collected data for Analysis
T1.5. Redu the process with Pancreatic Cancer
Results
T4.1. Structural Knowledge on Frataxin using
SBKB
T4.2. Using Uniprot for Frataxin Protein Study
T4.3. Protein-Protein Interaction using STRING
T4.4. Using same method for Pancreatic Cancer
and compare
3. Data Manipulation Molecular Online Tools and BioExtract Server
T1. Metabolomics
Objective : Use metabolic data repository to understand the frataxin protein mechanism
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T1.1. Finding the Enzyme and Pathway related to Frataxin using KEGG
T1.2. Finding the Reaction involved with Frataxin using Reactome
T1.3. Using BRENDA to find information on Frataxin
On the Reactome website: http://www.reactome.org/ReactomeGWT/entrypoint.html
o Search frataxin and select the 4th result with Frataxin in the title. This shows you the pathway model related to frataxin
and how frataxin is involved in it.
On the BRENDA Database website: http://www.brenda-enzymes.org/
o Search using the E.C. number obtained in T1.1 and select the result given. This website gives multitudes of information on
the enzyme including the reaction, related species, and so on. At the very bottom of the webpage you can select other
databases that have infromation on the same compound or protein
On the KEGG Database website: http://www.genome.jp/kegg/
o Search frataxin, and select the first result under KEGG Gene Database (hsa:2395)
o Copy the E.C. number given in “Definition” (EC:1.16.3.1)
o In order to find the related pathway, search the E.C. number in the general KEGG Database search (click on the KEGG
logo on top)
o Select the result given in the KEGG Enzyme Database at the bottom. Here you can see how this enzyme is involved in the
metabolism given.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
T1.4. Using Collected Information to Analyze the Data
On the BioModels website: http://www.ebi.ac.uk/biomodels-main/
o Search using the E.C. number obtained in T1.1 and select the first result given. Here you can download the SMBL file (in
student folder) for this pathway (top left corner) and analyze it in the Sematic SBML website.
http://semanticsbml.org/semanticSBML/simple/index
o Click on the first box “Find Similar Models” and click “Browse” and select the file you just saved from BioModels. In this
website you can use multiple tools to analyze the model and compare with other models as well.
T1.5. Same Process Searching for Pancreatic Cancer Results (Optional)
o Use the same process searching instead for pancreatic cancer results.
4. Molecular Online Tools and BioExtract Server
T2. Genome Exploration
Objective: Use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreatic
cancer disease. Next, find an appropriate data (sequence) on FASTA format.
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
On the NCBI website: http://www.ncbi.nlm.nih.gov/guide/
o Pull down “All Databases” and select “Gene” database, then do a keyword search using term FXN
o Click the corresponding Homo-sapiens FXN gene (first result)
o Scroll down and look for the “NCBI Reference Sequences” title and go to subtitle “mRNA and Proteins”
o Click on the corresponding accession number of the first transcript variant (NM_000144.4)
o Get the same sequence in FASTA format by clicking on “FASTA” link
o Click Send on the top right in blue, select complete record, file, FASTA, and Create File – then save in
student folder if possible (will save in downloads automatically)
T2.1. Locate a given gene on human genome
T2.2. Get a genomic sequence from NCBI (42 DataBases)
The common protein name for FXN is Frataxin
On the EBI website: http://www.ebi.ac.uk/
o Type “FXN” in the search and click on “find”
o Select the Homo Sapien Frataxin to get all the information about the protein (function, domains, structure, gene expression..)
o Don’t close the window
T2.3. Get the protein information and sequence from EBI
On the Ensembl web site http://uswest.ensembl.org/index.html
o Select our species "human“
o Do a keyword search using the term "FXN“
o Follow the link of the “Gene” drop down feature
o Click the link for “Location”
o Export this gene by clicking “Export data” (left side bar) in html file as a FASTA sequence.
o Click Next
o Click the “HTML” link
o Do the same process by searching for “pancreatic cancer”. When you find the list of genes, select the CDKN2A gene
Data Manipulation
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
5. Data Manipulation Molecular Online Tools and BioExtract Server
T3. Sequences Manipulation
Objective : Find similar sequence using BLAST tools and make alignment on given sequences.
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T3.1. Find similar sequences using BLAST tool
T3.2. Align generated sequences with ClustalW tool
o Select about 10 different species then click on “Align” at the bottom of the screen. Selected sequences will be
directly inserted in ClustalW tool and the tool will run automatically.
o From the right menu, it is possible to select similarities, polar residues, aromatic residues, etc. if interested…
o Through the same page you may add further sequences to the same alignment if needed. You can also access
the phylogenetic tree. More details about the residues and the distances can be obtained by clicking on
“Jalview” on the top right in orange. (May have to open Jalview manually)
o In Jalview, click “file”, “add sequences”, “from file”, then select the sequence file you save earlier.
o Continuing from Task T2.3, select the “Protein” tab on the left and select “view sequence in Uniprot”
o You can get the Fasta format of the protein by clicking on “fasta” in the top right
o Go back to previous page (using browser’s back button) and check the box next to the first sequence under
“Sequences” title.
o Select the “Blast” tool in the drop down menu then click on “Go” .
o The best matched sequences will appear on the first page (green indicates a better match). To see other
sequences you can click on next. Blast parameters can be modified by clicking on “Options” at the top
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
6. Data Manipulation Molecular Online Tools and BioExtract Server
T4. Protein Data and Structure Data
Objective : To provide protein levels of frataxin study and its connection with pancreatic cancer
(functional ad structural data)
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T4.1. Structural Knowledge on Frataxin using SBKB
T4.2. Using Uniprot for Frataxin Protein Study
T4.3. Protein-Protein Interaction using STRING
On Uniprot Database: http://www.uniprot.org/
o Search frataxin and select the first 3 results given and click “Download” in top right. You can then
“Open” or “Download” any of the results given
On the STRING Database: http://string-db.org/
o Search under “search by name” “FXN”.
oSelect the first result given and click “Continue”. Here you can look at the Protein-Protein
Interaction model and obtain more information on a given protein or integration by clicking on it
in the model, as well as use many other useful tools.
On Systems Biology Knowledgebase (SBKB): http://www.sbkb.org/
o Select “by text” (options on left) and search “frataxin”.
o For our example select the link next to “Structures and annotations…” Here you can obtain information
on all the different hits such a structure by looking under all the given tabs.
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
T4.4. Using same method for Pancreatic Cancer and compare
o Go back to the STRING Database home page search under “multiple names” “frataxin” and
“pancreatic cancer”. Select the first result.
oSelect all three results given and click “Continue”. Here it shows the 3 proteins we have
selected, however there are no interaction shown between them in this database.
o Can widen the given result by change our search for cancer in general.
7. o (If previous step was skipped, then this step is skipped as well) Again go to the query tab and search “FXN”. Search and select a few listings.
Export them as done in T5.2 Go to the tools tab.
o Select similarity search tools, then select “blastp”. Select “use records on extract page formatted as “Fasta”. Under "choose search set" select the
database "swissprot"
o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keep
only selected records.” Again export the records.
o Go to the tools tab again, select “iPlant”, then “clustal w2”. Select “use records on extract page formatted as “Fasta”. Your 10 protein sequences
will be automatically incorporated as an input in clustalw2 tool. Execute the tool. Use the pull down for “Search Results” and select “clustalw2.fa”
before viewing the results.
Data Manipulation Molecular Online Tools and BioExtract Server
Etienne Z. Gnimpieba
BRIN WS 2013
Mount Marty College – June 24th 2013
T5. Bioextract Server
Objective : Use Workflow Management Systems (WMS) to optimized data manipulation processes (BioExtract server).
Theme: Frataxin (FXN) implication in the pancreatic cancer genesis
T5.4. Workflow save & reused
http://bioextract.orgT5.1. Server Initialization
T5.2. Pancreatic cancer & Frataxin (FXN) data
T5.3. Mapping, Alignment
o Register on BioExtract Server to be able to create and save your own workflows.
o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.”
o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click on “save records”.
o Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”. Click
on “Add Seach Line” and select “Species” and type “Human”. Submit the query.
o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapien
Frataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results can
be saved or extracted in Fasta or txt format (Export the records in FASTA format)
o Click to the "tools" tab. then click on “Alignment Tools”, and “showalign”. Select “Use records on extract page formatted in Fasta”.
o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking on “view results”.
o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ” (Optional)
o Go back to the “workflow” tab and click “create and import workflows”. Write a name and a description for your workflow then click on Save. All
the previous steps will be saved in this workflow.
o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of the workflow to have a schematic view of it.
Run the workflow by clicking on “start”.
o Get and verify all the results by clicking on “provenance”. The general report can be saved for later analysis. Results of each tool can be viewed or
saved by clicking on “view file”.
o The same workflow can be executed for another query by simply modifying the accession number of the protein. (Click save in the “create and
import workflows” section to temporarily save the new query)
Editor's Notes
Welcome to this bioinformatics lab on data manipulation using online and server tools.As the theme, we have chosen to study of the interaction between Frataxin and pancreatic cancer.
This is the lab template: The context is a biological context based on a real biological problem. And a given hypothesisI don’t use computer science, strong word.When you read this template, you have a different view than an informatician.You want to understand the process to build the used tools.The architecture of the systemThe algorithm implementationThe quality of the resulting dataAnd so on