Assessing Differentially Expressed Genes in Storage Root of Cassava Landraces from Brazil and Colombia Using Microarray Data
1. Assessing Differentially Expressed Genes in Storage Root of
Cassava Landraces from Brazil and Colombia Using Microarray Data
Luiz Joaquim Castelo Branco Carvalho1, James V Anderson2, Diana Bernal3, Joe Tohme3, Chikelu
Mba4, Eduardo Alano Vieira5 and Elaine Cunha Moreno1
Laboratory of Biophysics and Biochemistry - LBB, Embrapa Cenargen - DF1; USDA/ARS, Plant Science Research
Unit, Fargo, ND2; CIAT, Cali-Colombia3; IAEA - Vienna, Austria4; EMBRAPA Cerrados5
Introduction
With a functional genomic’s based approach, the high throughput microarray technology sounds to be appropriated to gain information at global level to identify differentially expressed genes among landraces of divergent genetic
background. To date, most efforts using microarrays to study cassava has been focused on expression profiles with restricted genetic background oriented to diseases, abiotic stress and post harvest physiological deterioration. It
has also been claimed that cDNA chip developed for Euphorbiacea family, based on two species (leaf spurge and cassava), may be a useful tool to study gene expression analysis and diversity in cassava at a global base. Our
own work, using HPLC carotenoid profile as diversity sign, has identified color mutated phenotypes in the center of origin and domestication of cassava in Brazil. The present work combines diversity of pigmented cassava
landraces with genetic background from Brazil and Colombian varieties from CIAT to evaluate global gene expression analysis to help explain genetic diversity in pigmented cassava and discover new regulatory genes and
biological pathways in cassava storage root.
Materials and Methods
Domestication hypothesis: Our morphological model for cassava domestication Data analysis: There is no single software that contains all the existing analytical algorithms for gene expression analysis with
considers changes in growth habit, storage root formation and flowering sets of microarray. Our analytical procedure considered four levels of data analysis, included image and data quality evaluation (first
cassava ancestor (M esculenta ssp flabellifolia) to become the cultivated species (M level), statistically differentially expressed genes set identification (second level), functional and ontological genes sub-set classification
esculenta ssp esculenta) as in Figure 6. (third level), and regulatory networks of genes sub-set and biological pathways (fourth level).
Plant material: Storage root from cassava landraces
CAS36.7, Itauba, Jaboti, Mirasol, Surubim, IAC12-829 from Brazil and 25392 cDNA Array
MTAI16/1, CM2177-2/1, Reina1 and Veronica1 from Colombia (CIAT) were used in the Landrace Diversity Loop Dye Swap GenePix Analysis
Sugary Normal
present study. 23752 QC_Tr
Tissue sampling preparation: Cylinders of storage roots with 30-40 cm long and 4-6 GeneMath Analysis Cassava Data Base
IAC Leafy spurge Data Base
cm diameter were manually dissected in individual tissue layers, immediately frozen in Arabidopsis Data Base
liquid nitrogen and stored in -80°C until use. For RNA extraction we used layer 3
161 DEG
(young secondary growth tissue) because of its closeness of the cambium meristem Intense Yellow Pink Sug
and the most physiological active tissue in the storage root formation. MIPS Analysis Arabidopsis Annotation
Data Base
RNA extraction – tissue system and extraction procedures. RNA
extraction, purification and quantification followed conventional phenol:chloroform 75 ARG
procedure as previously described (de Souza et al, 2004). Ver
Pathway Studio Analysis ResNet Plant Data Base
CIAT Group
Microarrays – experimental design and hybridization procedures: The
Matai16/ GSEA SNEA
experimental design considered Loop-Dye Swap, biological replication 1
Veronica1
(3reps), sample replication (3reps), technical replication (2reps) and the dye CM2177- Reina1
2/1
Rei 34 RG 10 BRG
replication (2reps). cDNA labeling and chip hybridization used kit from Invitrogen
(Kits: Platinum® PCR SuperMix) and followed the procedure recommended. Thirty Pathways Retrieval
microgram of total RNA was used to prepare cDNA probe labeled with Cy3 and Cy5. CM2
Results and Discussions Regulatory
Network
Quality control of hybridization: High quality hybridization signal considered Array design, Image quality Metabolic pathway regulated genes: Figure 4 shows the regulation of genes coding for enzymes on
(background, intensity & reproducibility), Spot quality (center the synthesis and degradation of carotenoid in cassava.
location, background, intensity, noise, specificity, morphology & reproducibility) and Spike controls. The
cutoff limits for image quality threw out top 1% of outliers by using Gigh-PMT and saturation tolerance of expressed protein lycopene epsilon cyclase
0.05%. Under this condition the extent of expressed genes was obtained. Image quality analysis results for 1,2
phytoene desaturase neoxanthin cleavage enzyme nc1
zeta-carotene desaturase p-hydroxyphenylpyruvate dioxygenase
11 probes were representative with more than 93% with high quality hybridization signal ending with more
1
than 23000 elements (out of 25392) in the array with quality to continue the analysis.
0,8
Differentially expressed genes (DEG): Hybridization intensity signal were statistically analyzed to 0,6
determine the extent of expressed gene. A total of 161 genes (Table 1) showed to be differentially
0,4
expressed at a p-value of 0.005. Pattern of DEG data set was examined by two statistical strategies. First by
Principal Component Analysis and then tested by recursive Partitioning for a tentative conclusions on the 0,2
grouping patterns observed in the PCA. The PCA results (Figure 1) indicates the patterns of three groups of
0
genes in DEG that were confirmed with the partitioning grouping results (Figure 2). This grouping pattern is IAC-B-1 IAC-R-1 ITA-B-1 ITA-R-1 Sur-B-1 Sur-R-1
closely associated with the groups of landraces phenotypes. -0,2
Table 1 - Summarized numbers of genes statistically differentially expressed among landraces. -0,4
-0,6
Genome Source Total Anotated Unknown p_Value Coverage
Leaf Spurge 10 7 3 0,005 0,02 -0,8
Cassava 151 75 76 0,005 1,84
TOTAL 161 82 79 0,005 1,86 Exploratory pathways network and candidate regulatory genes: he algorithm Sub Network Enrichment
Statistical Analysis: Differentially Expressed Genes (p Value<0,005) Analysis (SNEA) was used to establish the level of significance (p-value) of regulatory genes in DEG sets
based on three kind of molecular interaction mechanisms (expression target, binding protein, and protein
modification). Statistically significant (p-value<0.05) regulatory genes networks were visualized as an
exploratory pathway network. Figure 5 indicating node operating gene, edge genes which are regulated
1.5e5
(activated or silenced) and their expression level, if up (blue color) or down (pink color) regulated. Among
LD_G I
genes interdependence, visualized in the pathway, regulatory genes such as transcription factors and other
genes products modulating functionality (protein binding and modification) were observed. The node gene in
1.0e5
the network operates the pathway and genes, while in the edge it is observed regulatory genes of a particular
pathway. Table 2 summarizes the list of node genes in the networks unique to each class of landraces when
comparisons were made to cassava ancestor and the elite variety IAC12-829.
5e4
0
Table 2 – Node genes unique to each class of landraces when comparisons were made to normal cassava.DD
LD_G III Pink Sugary Intense Yellow CIAT
-5e4
FLC(ET) *** MPK4(ET) SEU(ET)
ABI1(ET) ABI5(ET) AT1G50240(ET) LUG(ET)
-1.0e5
JAR1(ET) SLY1(PI) BRI1(PI) EDS1(ET)
*** PRL1(PI) *** ***
LD_G II
*** CLPP4(PI) *** ***
-1.5e5
-2.0e5 -1.5e5 -1.0e5 -5e4 0 5e4 1.0e5
Figure 1 - Principal component analysis for DEG. Figure 2 - Gene group partitioned in DEG.
Ontology and functional classification of DEG: The MIPS analyzes identified gene ontology and
functional groups. This information was used to select only regulatory genes sub-set to dissect the
pathway network and inferring on regulatory genes as performed bellow. Data results are summarized in
Figures 3.
99 UNCLASSIFIED
PROTEINS 01 METABOLISM
77 ORGAN LOCALIZATION (76)
02 ENERGY
70 SUBCELLULAR
LOCALIZATION 73 CELL TYPE
10 CELL CYCLE AND DNA
LOCALIZATION
PROCESSING
Glucose responsive pathway MADS-box transcription factors
11 TRANSCRIPTION
Unique to Sugary Unique to Pink
43 CELL TYPE 12 PROTEIN SYNTHESIS
DIFFERENTIATION
14 PROTEIN FATE
(folding, modification, destin
47 ORGAN
ation)
DIFFERENTIATION
42 BIOGENESIS OF
CELLULAR COMPONENTS
41 DEVELOPMENT
(Systemic)
16 PROTEIN WITH
40 CELL FATE BINDING FUNCTION OR
COFACTOR
36 SYSTEMIC REQUIREMENT
INTERACTION WITH THE (118)
ENVIRONMENT FUSED (FU) gene belongs to SEU transcriptional co-regulator of
34 INTERACTION WITH 18 REGULATION OF Ser/Thr protein kinase AGAMOUS
THE ENVIRONMENT METABOLISM AND
PROTEIN FUNCTION Unique to Yellow Unique to CIAT
32 CELL (7)
RESCUE, DEFENSE AND 30 CELLULAR 20 CELLULAR
VIRULENCE COMMUNICATION/SIGNAL TRANSPORT, TRANSPORT
TRANSDUCTION FACILITIES AND
Figure 5 – Diagram showing exploratory pathways network for regulatory genes related to the landrace diversity.
MECHANISM TRANSPORT ROUTES (40) Blue and pink colors symbols are up and down regulated genes.
Figure 3 - Profile of gene sets for DEG
Final Remarks and Future Perspective
Identified regulatory genes sub-sets: The algorithm Gene Set Enrichment Analysis (GSEA) Results indicated that the major genes differentially expressed are largely related to stress response such as up-
established regulatory genes functional statistically significance (p-value) groups in sub-sets for regulated gene for ABA synthesis, transcription factor homolog related to hypoxia, transport proteins for
biological processes, cellular component and molecular function. glucose/ABA and nitrogen, and three unknown genes. Transcript profiles for those genes across landraces
contrasting carotenoid HPLC profiles consistently correlated with end products of carotenoid synthesis.
Quantitative Real Time PCR are planed to confirm the uniqueness of each of pathway associated to a particular
Financial support: Ginés Mera Memorial Fellowship Fund - C-019-08 and color phenotype.
IAEA contract # BRA-13188/R0