1. Sequence analysis was performed on DNA from Glycine max (soybean) cultured cells using T3 and T7 primers, and the resulting sequences were edited and aligned.
2. The edited sequences showed high similarity, with some discrepancies, and contained restriction enzyme cut sites for EcoRI and XhoI as well as a polyA tail.
3. Multiple sequence alignments identified differences between the T3 and T7 sequences, with the greatest similarity from positions 51-900.
6. 6
Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785415), mRNA
Sequence for BLAST input, entire T3 cDNA insert sequence –
TTCTTAAGAGGAGGATGAGAAGGTGTCAATGATGGTTGTGTGGAGTGGGTCACTCAAGTGGGTGGCCCAGTTGTTGAGCGGAC
CCTTGCCGGTGGCGGCGGCTTGGACTGCAAAGCCCAAGAAGCCCACCATGGCAAGACGGGCGTGCTTGATCTCCGCCAATTG
AAGGGTGGCTTTCTTCTCTGGGTCTGAGGCCAGGCCCAATGGGTCGAAGTAACTGCCACCTGGGTACAACCTCTTCTCTGGGTC
GAGCTCTGCATTCCTCTGGAACTCTATGTAGCCAATTACGAGTACCTCGATCCAGATCAGTGTGGTGATTGAGAATGGAAGTGG
TTGCCCAAGGTATGATGACCCTTCTACTAGCTCCACCTTTCCGGCGTCTTGCCATGTAACACCAGTGAGCCACTCAACAGTGAG
AGCTCCGAGAGTGGCGAGCATGGCCCACCTTCCATGGATGAGTTCGCACTCACGGAACCTCTGGAGCCCAAACACCTCGCTGT
AGGGCTGAAACGGCGTGGACTTCACGTCCGCGAGCTCGGTCCTGGTTCCAATGATGTCCCCAGCCACGTTCTTCGCAAGGTTCT
GGTCCAGCGAGTCCAGCTCGAACTGCAGGTACTCCGCGGGCTTCCCTAGCCCAAACGG
Top 20 BLAST hits1
–
1. Glycine max cDNA, clone: GMFL01-17-I17 (99% ID)
2. Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785415), mRNA (99% ID)
3. PREDICTED: Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785180), mRNA
(99% ID)
4. Glycine max cDNA, clone: GMFL01-03-O02 (96% ID)
5. Vigna radiata chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC106774260), mRNA (90% ID)
6. Phaseolus vulgaris hypothetical protein (PHAVU_010G002100g) mRNA, complete cds (90% ID)
7. Morus notabilis hypothetical protein partial mRNA (Too small)
8. Glycine max strain Williams 82 clone GM_WBb0012I20, complete sequence (Too large)
9. PREDICTED: Oryza brachyantha chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC102700159),
mRNA (83% ID)
10. Phyllostachys edulis chloroplast chlorophyll a/b binding protein cab-PhE7 mRNA, complete cds; nuclear gene
for chloroplast product (82%)
11. Pyrus x bretschneideri clone 915 a-b binding protein mRNA, complete cds (83% ID)
12. Pyrus x bretschneideri chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC103936594), mRNA (Too
small)
13. Phyllostachys edulis cDNA clone: bphyem211d01, full insert sequence (82% ID)
14. PREDICTED: Malus x domestica chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC103450616),
mRNA (83% ID)
15. PREDICTED: Oryza sativa Japonica Group chlorophyll a-b binding protein CP29.1, chloroplastic (LOC4343583),
mRNA (82% ID)
16. Oryza sativa (indica cultivar-group) cDNA clone:OSIGCSN014K09, full insert sequence (82% ID)
17. Oryza sativa Japonica Group cDNA clone:001-204-B02, full insert sequence (82% ID)
18. Oryza sativa Japonica Group cDNA clone:006-309-F11, full insert sequence (82% ID)
19. Oryza sativa Japonica Group cDNA clone:001-013-C12, full insert sequence (82% ID)
20. Oryza sativa Japonica Group cDNA clone:J013098H13, full insert sequence (82% ID)
Top 5 Protein BLAST hits1
–
1. chlorophyll a-b binding protein CP29.2, chloroplastic-like [Glycine max] (Length = 290) (100% ID)
2. hypothetical protein GLYMA_03G060300 [Glycine max] (Length = 290) (99% ID)
7. 7
3. PREDICTED: chlorophyll a-b binding protein CP29.2, chloroplastic-like [Glycine max] (Length = 290) (97% ID)
4. PREDICTED: chlorophyll a-b binding protein CP29.3, chloroplastic-like [Glycine max] (Length = 278) (73% ID)
5. PREDICTED: chlorophyll a-b binding protein CP29.3, chloroplastic-like [Glycine max] (Length = 282) (72% ID)
Of the Light Harvesting Complex (LHC), two molecules which interact with visible light, chlorophyll
a and chlorophyll b (Chl-a and Chl-b respectively). The protein which is coded by the gene of interest is
chlorophyll a/b binding protein, which acts as an antenna for photosystems 1 and 2, a peripheral light-
harvesting protein.
ORF Analysis2
–
F S H T R R T L A Q T I M A T A T A A A
2 ttttctcacaccaggcgcacgttagcacaaactatcatggccacggcaacagcagctgcc 61
T S S F M G T R L L E A H S G A G R V H
62 acctcgtcctttatggggacgcgtctcctggaggctcactccggggcggggcgagtgcac 121
A R F G F G K K K A P A Q K K A S R G S
122 gcccgattcggcttcggcaagaaaaaggctcccgcccaaaagaaagcctccaggggatcg 181
G R D T V R P L W Y P G A K A P E Y L D
182 ggccgagacaccgtcagacccctttggtatccgggcgccaaagcgcccgaatacctcgat 241
G S L V G D Y G F D P F G L G K P A E Y
242 gggagtcttgtcggagactacgggttcgatccgtttgggctagggaagcccgcggagtac 301
L Q F E L D S L D Q N L A K N V A G D I
302 ctgcagttcgagctggactcgctggaccagaaccttgcgaagaacgtggctggggacatc 361
I G T R T E L A D V K S T P F Q P Y S E
362 attggaaccaggaccgagcttgcggacgtgaagtccacgccgtttcagccctacagcgag 421
V F G L Q R F R E C E L I H G R W A M L
422 gtgtttgggctccagaggttccgtgagtgcgaactcatccatggaaggtgggccatgctc 481
A T L G A L T V E W L T G V T W Q D A G
482 gccactctcggagctctcactgttgagtggctcactggtgttacatggcaagacgccgga 541
K V E L V E G S S Y L G Q P L P F S I T
542 aaggtggagctagtagaagggtcatcataccttgggcaaccacttccattctcaatcacc 601
T L I W I E A L V I G Y I E F Q R N A E
602 acactgatctggatcgaggcactcgtaattggctacatagagttccagaggaatgcagag 661
L D P E K R L Y P G G S Y F D P L G L A
662 ctcgacccagagaagaggttgtacccaggtggcagttacttcgacccattgggcctggcc 721
S D P E K K A T L Q L A E I K H A R L A
722 tcagacccagagaagaaagccacccttcaattggcggagatcaagcacgcccgtcttgcc 781
M V G F L G F A V Q A A V T G K G P L N
782 atggtgggcttcttgggctttgcagtccaagccgccgtcaccggcaagggcccgctcaac 841
N W A T H L S D P L H T T I I D T F S S
842 aactgggccacccacttgagtgacccactccacacaaccatcattgacaccttctcatcc 901
S S * E E E S F L C L V T I T S I L * K
902 tcctcttaagaagaagagtctttcttgtgcctcgtcactattactagcatattgtaaaag 961
S F L L R L L V V I R T F S L S W I S S
962 tcttttcttcttcggcttttggttgtaattagaacattttcacttagttggattagtagt 1021
T C E K L V T * N W R G * A E M L L L D
1022 acttgtgaaaaacttgtaacgtgaaattggagaggctaagcagaaatgcttctgcttgat 1081
V K C S P V N V Y Y A R N R E G M T R V
1082 gttaagtgttctcctgtaaatgtttattatgcacggaatcgagagggaatgacgagggta 1141
K K Y * * N * D H Y E R * W Q G I G L R
1142 aaaaaatactgataaaattgagatcactacgaaaggtaatggcagggaattggattgagg 1201
P K K K K K K K K
1202 ccaaaaaaaaaaaaaaaaaaaaaaaaaaa 1230
8. 8
Start codon – Methionine highlighted in yellow. Stop codons are red asterisks (8 total), the first observed around the
beginning of line 902. Using SIXFRAME from the Biology Workbench, this frame had the longest ORF, and least stop
codons.
Protein Sequence1
–
MATATAAATSSFMGTRLLEAHSGAGRVHARFGFGKKKAPAQKKASRGSGRDTVRPLWYPGAKAPEYLDGSLVGDYGFDPFGLGKP
AEYLQFELDSLDQNLAKNVAGDIIGTRTELADVKSTPFQPYSEVFGLQRFRECELIHGRWAMLATLGALTVEWLTGVTWQDAGKVELV
EGSSYLGQPLPFSITTLIWIEALVIGYIEFQRNAELDPEKRLYPGGSYFDPLGLASDPEKKATLQLAEIKHARLAMVGFLGFAVQAAVTGK
GPLNNWATHLSDPLHTTIIDTFSSSS
The first ATG (Methionine) is the start codon in the ORF analysis. I believe to have 100% of the cDNA.
The cDNA insert has a LAMP motif,2
which codes for proteins responsible for interacting with light.
Most photosynthetic proteins are coded by DNA with a LAMP motif, as well as proteins in ocular
tissues.
A hydropathy plot determines the hydrophobicity or hydrophilicity of a protein based on the amino acid
sequence.
9. 9
The window size of the hydropathy plot has been set to 9.4
Above the line indicates regions of hydrophobicity, most
likely transmembrane or hydrophobic pockets within the protein structure. Most of the protein appears hydrophilic
however, and is most likely in an aqueous state.
10. 10
References
(1) Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA
sequences", J Comput Biol 2000; 7(1-2):203-14.
(2) "SDSC Biology Workbench." SDSC Biology Workbench. Web. 14 Mar. 2016. <http://seqtool.sdsc.edu/>.
(3) "Software Developer of Next Gen Sequencing DNA Genetic Analysis and LIMS." Software Developer of Next Gen
Sequencing DNA Genetic Analysis and LIMS. Perkin Elmer, n.d. Web. 25 Feb. 2016.
<http://www.geospiza.com/>.
(4) "Genomics and Bioinformatics @ Davidson College." Genomics and Bioinformatics @ Davidson College. Web. 15
Mar. 2016. <http://gcat.davidson.edu/>.