SlideShare a Scribd company logo
International Journal of Engineering Inventions
ISSN: 2278-7461, www.ijeijournal.com
Volume 1, Issue 2 (September 2012) PP: 15-16


            Feature Extraction from Gene Expression Data Files
          Sayak Ganguli, Manoj Kumar Gupta, Sasti Gopal Das and Abhijit Datta
                              DBT Centre for Bioinformatics, Presidency University, Kolkata




Abstract––The Gene Expression Omnibus (GEO) maintained at the National Center for Biotechnology Information
(NCBI) is an archive where microarray data is available freely to researchers. The database provides us with numerous
data deposit options including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). Along with
the purpose of a repository a large collection of tools are available to help users effectively explore, and analyze gene
expression patterns stored in GEO. The need for better analyses of gene expression data was explored using object
oriented programming languages such as C and Practical extraction and reporting language (PERL). The test set that
was used for this study were human transcription factors. The results obtained indicate that feature extraction was
successful.

Keywords––Bioinformatics, Gene Expression Omnibus, NCBI, Object oriented programming.

                                           I.         INTRODUCTION
           The GEO database (3) is one of the largest repositories of gene expression data on the web (1, 2) which has been
generated using a large ensemble of high throughput measuring techniques. Such techniques include comparative genomic
hybridization for analyzing gain or loss of genes and tiling arrays for the analyses and detection of transcribed regions or
single-nucleotide polymorphism. Apart from these approaches data from ChIP-chip technology is also stored which provides
information on the protein binding regions on nucleotide molecules. Data from serial analysis of gene expression (SAGE),
massively parallel signature sequencing (MPSS), serial analysis of ribosomal sequence tags (SARST), and some peptide
profiling techniques such as tandem mass spectrometry (MS/MS) are also accepted. Several tools such as GENE CHASER
(5) exists which focus on the biological condition of expression of a particular data set, however, as microarray expression
data comprise around 95% of the available data types in GEO it becomes necessary for providing a tool which shall enable
the user to specifically analyze a particular data type from the expression data set (6).
The following are some existing tools which are used by the researchers for data analyses at GEO.
   1.      Profile neighbors retrieves gene connections that show a similar or reversed profile shape within a DataSet,
   2.      Sequence neighbors retrieves profiles which are related by nucleotide sequence similarity across all DataSets.
   3.      Homolog neighbors retrieve profiles of genes belonging to the same Homolo-Gene group.
   4.      Links menu enables the users to easily navigate from the GEO databases to associated records in other Entrez data
           domains including GenBank, PubMed, Gene, UniGene, OMIM, HomoloGene, Taxonomy, SAGEMap, and
           MapViewer.

           However, analyses of specific gene sets from the files that are present in the GEO present a significantly labour
intensive task. Here we present a feature extraction program written in C and PERL both belonging to the object oriented
programming domain, a necessity when dealing with bioinformatics datasets as individual values are to be treated as objects
and both these fundamental languages are easy to implement due to their platform independent property(4). The program
allows the user to extract information regarding specific genes in a set of files. The program is currently tested on
transcription factor expression data sets of human origin but would be extended to other datasets as well.

                                  II.           MATERIAL AND METHOD
          This program is written in C and integrates PERL for feature extraction from the GEO data files.
Steps for Sorting Human Transcription Factors:
      1. Generating raw data
      2. Generating transcription factors
      3. Generating Gene Data File
      4. Generating final transcription factors list




                                                                                                                        15
Feature Extraction from Gene Expression Data Files




                                          III.           CONCLUSIONS
           With the increase in experimental techniques utilizing high throughput technologies, the need for automated data
curation is paramount for successful time scale studies and for future implementation of robust analyses platforms. The
feature extraction algorithm developed was tested on the most abundant class of genes that are present in general expression
analyses – transcription factors. The results indicate that the algorithm can be extended in the development of an integrated
webserver for the analyses and characterization of more such gene specific queries. Having sequence information together
with expression information can help in the functional annotation and characterization of unknown genes or in finding novel
roles for characterized genes. These data are also valuable to genome-wide studies, allowing biologists to review global gene
expression in various cell types and states, to compare with orthologs in other species, and to search for repeated patterns of
coregulated groups of transcripts that assist formulation of hypotheses on functional networks and pathways.

                                   IV.            ACKNOWLEDGEMENTS
The authors acknowledge the support of the Department of Biotechnology, Government of India BTBI scheme.

                                                     REFERENCES
 1.       Akerkar, R. A.; Lingras, P. An Intelligent Web: Theory and Practice, 1st edn. 2008, Johns and Bartlett, Boston.
 2.       Albert, R.; Jeong, H.; Barab´asi, A.-L. Diameter of the world-wide Web. Nature, 1999, 401, pp. 130–131.
 3.       Barrett T, Edgar Ron. Mining Microarray Data at NCBI’s Gene Expression Omnibus (GEO) Methods Mol Biol.
          2006; 338: 175–190.
 4.       Chakrabarti, S. Data mining for hypertext: A tutorial survey. SIGKDD explorations, 2000, 1(2), pp. 1–11.
 5.       Chen R., Mallelwar R., Thosar A., Venkatasubrahmanyam S., Butte A.J. (2008) GeneChaser: identifying all
          biological and clinical conditions in which genes of interest are differentially expressed, BMC Bioinformatics
          9:548
 6.       Tarca AL, Romero R, Draghici S: Analysis of microarray experiments of gene expression profiling. Am J Obstet
          Gynecol, 2006 Aug;195(2):373-88




                                                                                                                           16

More Related Content

What's hot

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Fokhruz Zaman
 
Data mining ppt
Data mining pptData mining ppt
Data mining ppt
sai krishna
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technology
THILAKAR MANI
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
VinaKhan1
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Somdutt Sharma
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
chirag thakkar
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...
Sapan Anand
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
AkanshaChauhan15
 
Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)
Sijo A
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
Rai University
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
KAUSHAL SAHU
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
Pragya Pai
 
Bioinformatics Software
Bioinformatics SoftwareBioinformatics Software
Bioinformatics Software
university of education,Lahore
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
JTADrexel
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
Robert Cormia
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
biinoida
 
I NTRODUCTION.doc
I NTRODUCTION.docI NTRODUCTION.doc
I NTRODUCTION.doc
butest
 
Bioinformatics ppt
Bioinformatics pptBioinformatics ppt
Bioinformatics ppt
Sai Tharun Kumar Guttikonda
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
Chris Dwan
 

What's hot (20)

Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
Bioinformatics n bio-bio-1_uoda_workshop_4_july_2013_v1.0
 
Data mining ppt
Data mining pptData mining ppt
Data mining ppt
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technology
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)Bioinformatics for beginners (exam point of view)
Bioinformatics for beginners (exam point of view)
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Bioinformatics Software
Bioinformatics SoftwareBioinformatics Software
Bioinformatics Software
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
I NTRODUCTION.doc
I NTRODUCTION.docI NTRODUCTION.doc
I NTRODUCTION.doc
 
Bioinformatics ppt
Bioinformatics pptBioinformatics ppt
Bioinformatics ppt
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 

Viewers also liked

Cuales son las biomoleculas
Cuales son las biomoleculasCuales son las biomoleculas
Cuales son las biomoleculas
Carlos Taps
 
Staying ahead of seo trends in 2016
Staying ahead of seo trends in 2016Staying ahead of seo trends in 2016
Staying ahead of seo trends in 2016
James W
 
JF Smit Advanced Marine Fire Fighting certificate
JF Smit Advanced Marine Fire Fighting certificateJF Smit Advanced Marine Fire Fighting certificate
JF Smit Advanced Marine Fire Fighting certificateJakobus Smit
 
Tcn 2014 03_31_final
Tcn 2014 03_31_finalTcn 2014 03_31_final
Tcn 2014 03_31_final
COCommunityCollegeSystem
 
c4c1-RecordOfAchievement
c4c1-RecordOfAchievementc4c1-RecordOfAchievement
c4c1-RecordOfAchievement
Anant Srivastav
 
ArisViceCV
ArisViceCVArisViceCV
ArisViceCV
Aris Roussinos
 
EcoDaisy Ad
EcoDaisy AdEcoDaisy Ad
EcoDaisy Ad
Teresa Waddell
 
와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)
와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)
와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)atelier t*h
 
ディズニークチュールジュエリーの歴史
ディズニークチュールジュエリーの歴史ディズニークチュールジュエリーの歴史
ディズニークチュールジュエリーの歴史pxpx6630
 
1 กำหนดการแนะนำโปรแกรม
1 กำหนดการแนะนำโปรแกรม1 กำหนดการแนะนำโปรแกรม
1 กำหนดการแนะนำโปรแกรมPG PukGuard'z
 
Onde tem água, tem tudo!
Onde tem água, tem tudo!Onde tem água, tem tudo!
Onde tem água, tem tudo!
FCVSA
 
trigomometria
trigomometriatrigomometria
trigomometria
Juan Timoteo Cori
 
Stereotypes
StereotypesStereotypes
Stereotypes
emmaclare88
 
Race questions
Race questionsRace questions
Race questions
mcghee27
 
Doc1
Doc1Doc1
Celine reference letter 2013-2014
Celine reference letter 2013-2014Celine reference letter 2013-2014
Celine reference letter 2013-2014
Celine Suhr Nielsen
 
Métodos de estudos da bíblia
Métodos de estudos da bíbliaMétodos de estudos da bíblia
Métodos de estudos da bíblia
Antonio Filho
 

Viewers also liked (17)

Cuales son las biomoleculas
Cuales son las biomoleculasCuales son las biomoleculas
Cuales son las biomoleculas
 
Staying ahead of seo trends in 2016
Staying ahead of seo trends in 2016Staying ahead of seo trends in 2016
Staying ahead of seo trends in 2016
 
JF Smit Advanced Marine Fire Fighting certificate
JF Smit Advanced Marine Fire Fighting certificateJF Smit Advanced Marine Fire Fighting certificate
JF Smit Advanced Marine Fire Fighting certificate
 
Tcn 2014 03_31_final
Tcn 2014 03_31_finalTcn 2014 03_31_final
Tcn 2014 03_31_final
 
c4c1-RecordOfAchievement
c4c1-RecordOfAchievementc4c1-RecordOfAchievement
c4c1-RecordOfAchievement
 
ArisViceCV
ArisViceCVArisViceCV
ArisViceCV
 
EcoDaisy Ad
EcoDaisy AdEcoDaisy Ad
EcoDaisy Ad
 
와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)
와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)
와이파이를 감염시키는 바이러스, 카멜레온(Chameleon)
 
ディズニークチュールジュエリーの歴史
ディズニークチュールジュエリーの歴史ディズニークチュールジュエリーの歴史
ディズニークチュールジュエリーの歴史
 
1 กำหนดการแนะนำโปรแกรม
1 กำหนดการแนะนำโปรแกรม1 กำหนดการแนะนำโปรแกรม
1 กำหนดการแนะนำโปรแกรม
 
Onde tem água, tem tudo!
Onde tem água, tem tudo!Onde tem água, tem tudo!
Onde tem água, tem tudo!
 
trigomometria
trigomometriatrigomometria
trigomometria
 
Stereotypes
StereotypesStereotypes
Stereotypes
 
Race questions
Race questionsRace questions
Race questions
 
Doc1
Doc1Doc1
Doc1
 
Celine reference letter 2013-2014
Celine reference letter 2013-2014Celine reference letter 2013-2014
Celine reference letter 2013-2014
 
Métodos de estudos da bíblia
Métodos de estudos da bíbliaMétodos de estudos da bíblia
Métodos de estudos da bíblia
 

Similar to call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technolog

Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
Sangeeta Das
 
LIMS for maize mapping project
LIMS for maize mapping projectLIMS for maize mapping project
LIMS for maize mapping project
Carlos Vela Shimano
 
LIMS FOR MAIZE MAPPING PROJECT
LIMS FOR MAIZE MAPPING PROJECTLIMS FOR MAIZE MAPPING PROJECT
LIMS FOR MAIZE MAPPING PROJECT
G2 APPS SA DE CV
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
ijitcs
 
An Agent-Based System For Re-Annotation Of Genomes
An Agent-Based System For Re-Annotation Of GenomesAn Agent-Based System For Re-Annotation Of Genomes
An Agent-Based System For Re-Annotation Of Genomes
Kate Campbell
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
csandit
 
D1803012022
D1803012022D1803012022
D1803012022
IOSR Journals
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
PhD Assistance
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
Yasel Cruz
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
contactsoorya
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
Jaclyn Williams
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
Elia Brodsky
 
Encode Project
Encode ProjectEncode Project
Encode Project
Anarghya Hegde
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
Aisha Kalsoom
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
shashi bijapure
 
Greene Bosc2008
Greene Bosc2008Greene Bosc2008
Greene Bosc2008
bosc_2008
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
Helena Deus
 
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
IJCSEA Journal
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
Editor IJCATR
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
Waqas Tariq
 

Similar to call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technolog (20)

Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
LIMS for maize mapping project
LIMS for maize mapping projectLIMS for maize mapping project
LIMS for maize mapping project
 
LIMS FOR MAIZE MAPPING PROJECT
LIMS FOR MAIZE MAPPING PROJECTLIMS FOR MAIZE MAPPING PROJECT
LIMS FOR MAIZE MAPPING PROJECT
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
An Agent-Based System For Re-Annotation Of Genomes
An Agent-Based System For Re-Annotation Of GenomesAn Agent-Based System For Re-Annotation Of Genomes
An Agent-Based System For Re-Annotation Of Genomes
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
 
D1803012022
D1803012022D1803012022
D1803012022
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
Encode Project
Encode ProjectEncode Project
Encode Project
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Bioinformatics seminar
Bioinformatics seminarBioinformatics seminar
Bioinformatics seminar
 
Greene Bosc2008
Greene Bosc2008Greene Bosc2008
Greene Bosc2008
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 

More from International Journal of Engineering Inventions www.ijeijournal.com

H04124548
H04124548H04124548
G04123844
G04123844G04123844
F04123137
F04123137F04123137
E04122330
E04122330E04122330
C04121115
C04121115C04121115
B04120610
B04120610B04120610
A04120105
A04120105A04120105
F04113640
F04113640F04113640
E04112135
E04112135E04112135
D04111520
D04111520D04111520
C04111114
C04111114C04111114
B04110710
B04110710B04110710
A04110106
A04110106A04110106
I04105358
I04105358I04105358
H04104952
H04104952H04104952
G04103948
G04103948G04103948
F04103138
F04103138F04103138
E04102330
E04102330E04102330
D04101822
D04101822D04101822
C04101217
C04101217C04101217

More from International Journal of Engineering Inventions www.ijeijournal.com (20)

H04124548
H04124548H04124548
H04124548
 
G04123844
G04123844G04123844
G04123844
 
F04123137
F04123137F04123137
F04123137
 
E04122330
E04122330E04122330
E04122330
 
C04121115
C04121115C04121115
C04121115
 
B04120610
B04120610B04120610
B04120610
 
A04120105
A04120105A04120105
A04120105
 
F04113640
F04113640F04113640
F04113640
 
E04112135
E04112135E04112135
E04112135
 
D04111520
D04111520D04111520
D04111520
 
C04111114
C04111114C04111114
C04111114
 
B04110710
B04110710B04110710
B04110710
 
A04110106
A04110106A04110106
A04110106
 
I04105358
I04105358I04105358
I04105358
 
H04104952
H04104952H04104952
H04104952
 
G04103948
G04103948G04103948
G04103948
 
F04103138
F04103138F04103138
F04103138
 
E04102330
E04102330E04102330
E04102330
 
D04101822
D04101822D04101822
D04101822
 
C04101217
C04101217C04101217
C04101217
 

call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technolog

  • 1. International Journal of Engineering Inventions ISSN: 2278-7461, www.ijeijournal.com Volume 1, Issue 2 (September 2012) PP: 15-16 Feature Extraction from Gene Expression Data Files Sayak Ganguli, Manoj Kumar Gupta, Sasti Gopal Das and Abhijit Datta DBT Centre for Bioinformatics, Presidency University, Kolkata Abstract––The Gene Expression Omnibus (GEO) maintained at the National Center for Biotechnology Information (NCBI) is an archive where microarray data is available freely to researchers. The database provides us with numerous data deposit options including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). Along with the purpose of a repository a large collection of tools are available to help users effectively explore, and analyze gene expression patterns stored in GEO. The need for better analyses of gene expression data was explored using object oriented programming languages such as C and Practical extraction and reporting language (PERL). The test set that was used for this study were human transcription factors. The results obtained indicate that feature extraction was successful. Keywords––Bioinformatics, Gene Expression Omnibus, NCBI, Object oriented programming. I. INTRODUCTION The GEO database (3) is one of the largest repositories of gene expression data on the web (1, 2) which has been generated using a large ensemble of high throughput measuring techniques. Such techniques include comparative genomic hybridization for analyzing gain or loss of genes and tiling arrays for the analyses and detection of transcribed regions or single-nucleotide polymorphism. Apart from these approaches data from ChIP-chip technology is also stored which provides information on the protein binding regions on nucleotide molecules. Data from serial analysis of gene expression (SAGE), massively parallel signature sequencing (MPSS), serial analysis of ribosomal sequence tags (SARST), and some peptide profiling techniques such as tandem mass spectrometry (MS/MS) are also accepted. Several tools such as GENE CHASER (5) exists which focus on the biological condition of expression of a particular data set, however, as microarray expression data comprise around 95% of the available data types in GEO it becomes necessary for providing a tool which shall enable the user to specifically analyze a particular data type from the expression data set (6). The following are some existing tools which are used by the researchers for data analyses at GEO. 1. Profile neighbors retrieves gene connections that show a similar or reversed profile shape within a DataSet, 2. Sequence neighbors retrieves profiles which are related by nucleotide sequence similarity across all DataSets. 3. Homolog neighbors retrieve profiles of genes belonging to the same Homolo-Gene group. 4. Links menu enables the users to easily navigate from the GEO databases to associated records in other Entrez data domains including GenBank, PubMed, Gene, UniGene, OMIM, HomoloGene, Taxonomy, SAGEMap, and MapViewer. However, analyses of specific gene sets from the files that are present in the GEO present a significantly labour intensive task. Here we present a feature extraction program written in C and PERL both belonging to the object oriented programming domain, a necessity when dealing with bioinformatics datasets as individual values are to be treated as objects and both these fundamental languages are easy to implement due to their platform independent property(4). The program allows the user to extract information regarding specific genes in a set of files. The program is currently tested on transcription factor expression data sets of human origin but would be extended to other datasets as well. II. MATERIAL AND METHOD This program is written in C and integrates PERL for feature extraction from the GEO data files. Steps for Sorting Human Transcription Factors: 1. Generating raw data 2. Generating transcription factors 3. Generating Gene Data File 4. Generating final transcription factors list 15
  • 2. Feature Extraction from Gene Expression Data Files III. CONCLUSIONS With the increase in experimental techniques utilizing high throughput technologies, the need for automated data curation is paramount for successful time scale studies and for future implementation of robust analyses platforms. The feature extraction algorithm developed was tested on the most abundant class of genes that are present in general expression analyses – transcription factors. The results indicate that the algorithm can be extended in the development of an integrated webserver for the analyses and characterization of more such gene specific queries. Having sequence information together with expression information can help in the functional annotation and characterization of unknown genes or in finding novel roles for characterized genes. These data are also valuable to genome-wide studies, allowing biologists to review global gene expression in various cell types and states, to compare with orthologs in other species, and to search for repeated patterns of coregulated groups of transcripts that assist formulation of hypotheses on functional networks and pathways. IV. ACKNOWLEDGEMENTS The authors acknowledge the support of the Department of Biotechnology, Government of India BTBI scheme. REFERENCES 1. Akerkar, R. A.; Lingras, P. An Intelligent Web: Theory and Practice, 1st edn. 2008, Johns and Bartlett, Boston. 2. Albert, R.; Jeong, H.; Barab´asi, A.-L. Diameter of the world-wide Web. Nature, 1999, 401, pp. 130–131. 3. Barrett T, Edgar Ron. Mining Microarray Data at NCBI’s Gene Expression Omnibus (GEO) Methods Mol Biol. 2006; 338: 175–190. 4. Chakrabarti, S. Data mining for hypertext: A tutorial survey. SIGKDD explorations, 2000, 1(2), pp. 1–11. 5. Chen R., Mallelwar R., Thosar A., Venkatasubrahmanyam S., Butte A.J. (2008) GeneChaser: identifying all biological and clinical conditions in which genes of interest are differentially expressed, BMC Bioinformatics 9:548 6. Tarca AL, Romero R, Draghici S: Analysis of microarray experiments of gene expression profiling. Am J Obstet Gynecol, 2006 Aug;195(2):373-88 16