Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks. Or Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life.
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Bioinformatics Gaussian by ChARM’s
1. SUBMITTED TO:
Prof. Neeraj Bhargava
Professor and Head
Department of Computer Science
1
PRESENTED BY:
Arpit Kumar Sharma
M.Tech IV Sem
Department of Computer Science
2. CONTENT
• MATLAB SOFTWARE
• INTRODUCTION
• BIOINFORMATIC
• CHARM’S
• IMPLEMENTATION
• IMPLEMENTATION IMAGE
• METHODOLOGY
• RESULT & ANALYSIS
• RESULT OF DATASET
2
3. MATLAB SOFTWARE
• MATLAB is a high-performance language for
technical computing. It integrates computation,
visualization, and programming in an easy-to-use
environment-where problems and solutions are
expressed in familiar mathematical notation.
FEATURES :-
• Math and computation
• Algorithm development
3
4. Cont.. 4
• Data acquisition Modeling, simulation,
and prototyping Data analysis,
exploration, and visualization
• Ability to Scale
• Scientific and engineering graphic
• Application development, including
graphical user interface building.
5. INTRODUCTION
• Bioinformatics is an interdisciplinary field that
develops methods and software tools for
understanding biological data.
• As an interdisciplinary field of science,
bioinformatics combines biology, computer
science, mathematics and statistics to analyze and
interpret biological data.
5
6. Cont.. 6
•Bioinformatics has been used for in
silico analyses of biological queries using
mathematical and statistical techniques.
7. Cont.. 7
•Bioinformatics is both an umbrella term for
the body of biological studies that
use computer programming as part of their
methodology, as well as a reference to
specific analysis "pipelines" that are
repeatedly used, particularly in the field
of genomics.
8. 8
• ChARM , an unsupervised method for discovering
combinatorial chromatin modification patterns, can
identify histone modifications that occur globally
• ChARM provides a scalable framework
•CHARM: An Efficient Algorithm for Closed
Association Rule Mining
9. 9
•Feature extraction: A total of 953 features are
extracted on a whole-image basis using Cell Profiler.
•Dimension reduction: Features are projected in
principal components space, and a subset of
principal components analysis (PCA) vectors is
retained such that 98 % of the variance present in the
original data distribution is conserved.
10. Cont. 10
•Classification: Linear Discriminate Analysis
(LDA) is used to classify the selected PCA-
transformed feature vectors.
•Validation: The classifier’s performance is
assessed with 10-fold cross-validation.
12. METHODOLOGY 12
The National Center for Biotechnology
Information advances science and health by
providing access to biomedical and genomic
information.
13. Cont.. 13
After the Login NCBI provides the access of features
•Submit
•Download
•Learn
•Develop
• Analyze
•Research
14. Cont.. 14
•Submit
NCBI collects submissions of data for the world's
largest public repository of biological and scientific
information. Submit the data and track the status of
submission of Data .
.
•Download
The majority of NCBI data are available for
downloading, either directly from the NCBI FTP site
or by using software tools to download custom
datasets. The basic need of download feature provides
three types of scenario.
15. Cont.. 15
•Learn
NCBI creates a variety of educational products
including courses, workshops, webinars, training
materials and documentation. NCBI educational events
are free and open to everyone. All NCBI educational
materials are available for anyone to re-use and
distribute.
•Develop
NCBI provides a variety of resources that allow
developers to access and manipulate NCBI data in
their applications
16. Cont.. 16
•Research
Research in the NCBI Computational Biology Branch
(CBB) focuses on theoretical, analytical, and applied
computational approaches to a broad range of
fundamental problems in molecular biology and
medicine.
•Analyze
NCBI provides a wide variety of data analysis tools that
allow users to manipulate, align, visualize and evaluate
biological data.
17. ANALYSIS 17
•Use GEO2R(Web-Tool) to compare two or more
groups of Samples in order to identify genes that are
differentially expressed across experimental
conditions. Results are presented as a table of genes
ordered by significance. (My Database GEO accession
Name is GSE72586 )
•We Also derives the Value Distribution, Options,
Profile Graph, R-Script .