SlideShare a Scribd company logo
1 of 10
1
Sequence Analysis of an Isolated Culture, Pertaining to
Glycine Max
By: Alexander Ward
CMB 426 - Dr. Tsou
3/15/2016
2
Original/Unedited Sequences3
–
T3 Primer –
GAACTCCGCGGGTGCGGCCGCTCTGAACTAGTGGATCCCCCGGGCTGCAGGAATTCGGCACGAGCGGCAACAGCAGCTGCCA
CCTCGTCCTTTATGGGGACGCGTCTCCTGGAGGCTCACTCCGGAGCGGGGCGAGTGCACGCCCGATTCGGCTTCGGCAAGAAA
AAGGCTCCCGCCCCAAAGAAAGCCTCCAGGGGATCGGGCCGAGACACCGACAGACCCCTTTGGTATCCGGGCGCCAAAGCGC
CCGAATACCTCGATGGGAGTCTTGTCGGAGACTACGGGTTCGATCCGTTTGGGCTAGGGAAGCCCGCGGAGTACCTGCAGTTC
GAGCTGGACTCGCTGGACCAGAACCTTGCGAAGAACGTGGCTGGGGACATCATTGGAACCAGGACCGAGCTCGCGGACGTGA
AGTCCACGCCGTTTCAGCCCTACAGCGAGGTGTTTGGGCTCCAGAGGTTCCGTGAGTGCGAACTCATCCATGGAAGGTGGGCC
ATGCTCGCCACTCTCGGAGCTCTCACTGTTGAGTGGCTCACTGGTGTTACATGGCAAGACGCCGGAAAGGTGGAGCTAGTAGA
AGGGTCATCATACCTTGGGCAACCACTTCCATTCTCAATCACCACACTGATCTGGATCGAGGTACTCGTAATTGGCTACATAGA
GTTCCAGAGGAATGCAGAGCTCGACCCAGAGAAGAGGTTGTACCCAGGTGGCAGTTACTTCGACCCATTGGGCCTGGCCTCAG
ACCCAGAGAAGAAAGCCACCCTTCAATTGGCGGAGATCAAGCACGCCCGTCTTGCCATGGTGGGCTTCTTGGGCTTTGC AGTC
CAAGCCGCCGCCACCGGCAAGGGTCCGCTCAACAACTGGGCCACCCACTTGAGTGACCCACTCCACACAACCATCATTGACAC
CTTCTCATCCTCCTCTTAAGAAGAAGAGTCTTTCTTGTGCCTCGTCACTATTACTAGCATATTGTAAAAGTCTTTTCTTCTTCGGCT
TTTGGTTGTAATTAAAACATTTTCACTTANTTGGAATNAGTAAATACTTGTGAAAAAACTTNGTAAACGTGGAAANTNGGANAG
GCTAAACCAAAAATGGCTTTCTGCTTNAATGTTAAAAAAAAAAAAAAAAAAAAAAAATTCGAGGGGGGGNCCCGGGTAACCC
NAATTNCCCCCNTAAAAGNGGNGTTCNAAATTAAAAANTTCACTGGGCCNGTTCGTTTTTNAAAACGTTCNTNAACGGGGGN
AAAAACCCCTGGGGGGTTTACCCCAANTTTAAATTCCCCCTTTNGGAAANAAAATTCCCCCTTTTTNCCCNANGNTGGGGGNT
NAAANAACNAAAAAANNGCCCCCCCCACCAAATTNNCCCNTTTTCCAAAAAANTTTNCCCAACCNTNAAAAGGGNNAAAN
GGNAAAA
T7 Primer – Slight Taq slip following poly-T
CGGGCCCCCTGCGTTTTTTTTTTTTTTTTTTTTACATCAAGCAGAAGCATTTCTGCTTAGCCTCTCCAATTTCACGTTACAAGTTTTT
CACAAGTACTACTAATCCAACTAAGTGAAAATGTTCTAATTACAACCAAAAGCCGAAGAAGAAAAGACTTTTACAATATGCTAG
TAATAGTGACGAGGCACAAGAAAGACTCTTCTTCTTAAGAGGAGGATGAGAAGGTGTCAATGATGGTTGTGTGGAGTGGGTCA
CTCAAGTGGGTGGCCCAGTTGTTGAGCGGACCCTTGCCGGTGGCGGCGGCTTGGACTGCAAAGCCCAAGAAGCCCACCATGG
CAAGACGGGCGTGCTTGATCTCCGCCAATTGAAGGGTGGCTTTCTTCTCTGGGTCTGAGGCCAGGCCCAATGGGTCGAAGTAA
CTGCCACCTGGGTACAACCTCTTCTCTGGGTCGAGCTCTGCATTCCTCTGGAACTCTATGTAGCCAATTACGAGTACCTCGATCC
AGATCAGTGTGGTGATTGAGAATGGAAGTGGTTGCCCAAGGTATGATGAC CCTTCTACTAGCTCCACCTTTCCGGCGTCTTGCC
ATGTAACACCAGTGAGCCACTCAACAGTGAGAGCTCCGAGAGTGGCGAGCATGGCCCACCTTCCATGGATGAGTTCGCACTCA
CGGAACCTCTGGAGCCCAAACACCTCGCTGTAGGGCTGAAACGGCGTGGACTTCACGTCCGCGAGCTCGGTCCTGGTTCCAAT
GATGTCCCCAGCCACGTTCTTCGCAAGGTTCTGGTCCAGCGAGTCCAGCTCGAACTGCAGGTACTCCGCGGGCTTCCCTAGCCC
AAACGGGATCGAACCCGTAGTCTCCGACAAGACTCCCCATCGAGGTATTTCGGGGCGCTTTTGGCGCCCGGGATACCAAAGGG
GGTCTGTCGGGTGGTCTCCGGCCCCGAATCCCCTGGGAGGGCTTTTCTTTTGGGGGGCGGGAAGCCTTTTTTTCTTGCCGGAAG
CCCGAAAATCGGGGGCGGTGGNACTNCGCCCCCCGGNTTCNGGGANGTGAAGCCCTTCCCNAGGAANAAACNNNNGTTCC
CCCCCNATAAAAANGGANCNAAANGGTGGGCNANCCCTTGCCTTTTTTGCCCCNCCTTCCGTNGNCCCAAAAATTCCNTGGC
CANNCCCCCGGGGGGGGAAATCCCCCCTTAANNTTTNTTAAAAAAGCGGGGCCCCCCCCNCCCNCCCNGGGNGGGNAAAC
CTTCCCNACCCNTTTTTTGNTTTCCCCCCNTTTTANNGGGGAN
3
Edited T3 Primer –
GAATTCGGCACGAGCGGCAACAGCAGCTGCCACCTCGTCCTTTATGGGGACGCGTCTCCTGGAGGCTCACTCCGGAGCGGGG
CGAGTGCACGCCCGATTCGGCTTCGGCAAGAAAAAGGCTCCCGCCCCAAAGAAAGCCTCCAGGGGATCGGGCCGAGACACC
GACAGACCCCTTTGGTATCCGGGCGCCAAAGCGCCCGAATACCTCGATGGGAGTCTTGTCGGAGACTACGGGTTCGATCCGTT
TGGGCTAGGGAAGCCCGCGGAGTACCTGCAGTTCGAGCTGGACTCGCTGGACCAGAACCTTGCGAAGAACGTGGCTGGGGAC
ATCATTGGAACCAGGACCGAGCTCGCGGACGTGAAGTCCACGCCGTTTCAGCCCTACAGCGAGGTGTTTGGGCTCCAGAGGTT
CCGTGAGTGCGAACTCATCCATGGAAGGTGGGCCATGCTCGCCACTCTCGGAGCTCTCACTGTTGAGTGGCTCACTGGTGTTAC
ATGGCAAGACGCCGGAAAGGTGGAGCTAGTAGAAGGGTCATCATACCTTGGGCAACCACTTCCATTCTCAATCACCACACTGA
TCTGGATCGAGGTACTCGTAATTGGCTACATAGAGTTCCAGAGGAATGCAGAGCTCGACCCAGAGAAGAGGTTGTACCCAGGT
GGCAGTTACTTCGACCCATTGGGCCTGGCCTCAGACCCAGAGAAGAAAGCCACCCTTCAATTGGCGGAGATCAAGCACGCCCG
TCTTGCCATGGTGGGCTTCTTGGGCTTTGCAGTCCAAGCCGCCGCCACCGGCAAGGGTCCGCTCAACAACTGGGCCACCCACTT
GAGTGACCCACTCCACACAACCATCATTGACACCTTCTCATCCTCCTCTTAAGAAGAAGAGTCTTTCTTGTGCCTCGTCACTATTA
CTAGCATATTGTAAAAGTCTTTTCTTCTTCGGCTTTTGGTTGTAATTAAAACATTTTCACTTA
Edited T7 Primer –
ACATCAAGCAGAAGCATTTCTGCTTAGCCTCTCCAATTTCACGTTACAAGTTTTTCACAAGTACTACTAATCCAACTAAGTGAAA
ATGTTCTAATTACAACCAAAAGCCGAAGAAGAAAAGACTTTTACAATATGCTAGTAATAGTGACGAGGCACAAGAAAGACTCTT
CTTCTTAAGAGGAGGATGAGAAGGTGTCAATGATGGTTGTGTGGAGTGGGTCACTCAAGTGGGTGGCCCAGTTGTTGAGCGGA
CCCTTGCCGGTGGCGGCGGCTTGGACTGCAAAGCCCAAGAAGCCCACCATGGCAAGACGGGCGTGCTTGATCTCCGCCAATT
GAAGGGTGGCTTTCTTCTCTGGGTCTGAGGCCAGGCCCAATGGGTCGAAGTAACTGCCACCTGGGTACAACCTCTTCTCTGGGT
CGAGCTCTGCATTCCTCTGGAACTCTATGTAGCCAATTACGAGTACCTCGATCCAGATCAGTGTGGTGATTGAGAATGGAAGTG
GTTGCCCAAGGTATGATGACCCTTCTACTAGCTCCACCTTTCCGGCGTCTTGCCATGTAACACCAGTGAGCCACTCAACAGTGA
GAGCTCCGAGAGTGGCGAGCATGGCCCACCTTCCATGGATGAGTTCGCACTCACGGAACCTCTGGAGCCCAAACACCTCGCTG
TAGGGCTGAAACGGCGTGGACTTCACGTCCGCGAGCTCGGTCCTGGTTCCAATGATGTCCCCAGCCACGTTCTTCGCAAGGTTC
TGGTCCAGCGAGTCCAGCTCGAACTGCAGGTACTCCGCGGGCTTCCCTAGCCCAAACGGGATCGAACCCGTAGTCTCCGACAA
GACTCCCCATCGAGGTATTTCGGGGCGCTTTTGGCGCCCGGGATACCAAAGGGGGTCTGTCGGGTGGTCTCCGGCCCCGAATC
CCCTGGGAGGGCTTTTCTTTTGGGGGGCGGGAAGCCTTTTTTTCTTGCCGGAAGCCCGAAAATCGGGGGCGGTGG
Green = EcoRI Cut Site, Red = XhoI Cut Site, Teal = PolyA Tail, Purple = Sequence discrepancies; Multiple cut sites
were not observed.
4
Align 2 or more sequences using CLUSTALW2
– PILEMSF Format
1 50
T3_edited GAATTCGGCA CGAGCGGCAA CAGCAGCTGC CACCTCGTCC TTTATGGGGA
T7_edited__Reve .......... .......... .......... .......... ..........
51 100
T3_edited CGCGTCTCCT GGAGGCTCAC TCCGGAGCGG GGCGAGTGCA CGCCCGATTC
T7_edited__Reve .......... .......... .......... ..CCACCGCC CCCGATTTTC
101 150
T3_edited GG..CTTCGG CAAGAAAAAG G...CTCCCG CCCC...AAA GAAAG...CC
T7_edited__Reve GGGCTTCCGG CAAGAAAAAA AGGCTTCCCG CCCCCCAAAA GAAAAGCCCT
151 200
T3_edited TCCAGGGGAT C..GGGCCG. AGACA..CCG ACAGACCCC. TTTGGTATCC
T7_edited__Reve CCCAGGGGAT TCGGGGCCGG AGACCACCCG ACAGACCCCC TTTGGTATCC
201 250
T3_edited .GGGCGCCAA A.GCGCCC.. GAATACCTCG ATGGG.AGTC TTGTCGGAGA
T7_edited__Reve CGGGCGCCAA AAGCGCCCCG AAATACCTCG ATGGGGAGTC TTGTCGGAGA
251 300
T3_edited CTACGGGTTC GATCC.GTTT GGGCTAGGGA AGCCCGCGGA GTACCTGCAG
T7_edited__Reve CTACGGGTTC GATCCCGTTT GGGCTAGGGA AGCCCGCGGA GTACCTGCAG
301 350
T3_edited TTCGAGCTGG ACTCGCTGGA CCAGAACCTT GCGAAGAACG TGGCTGGGGA
T7_edited__Reve TTCGAGCTGG ACTCGCTGGA CCAGAACCTT GCGAAGAACG TGGCTGGGGA
351 400
T3_edited CATCATTGGA ACCAGGACCG AGCTCGCGGA CGTGAAGTCC ACGCCGTTTC
T7_edited__Reve CATCATTGGA ACCAGGACCG AGCTCGCGGA CGTGAAGTCC ACGCCGTTTC
401 450
T3_edited AGCCCTACAG CGAGGTGTTT GGGCTCCAGA GGTTCCGTGA GTGCGAACTC
T7_edited__Reve AGCCCTACAG CGAGGTGTTT GGGCTCCAGA GGTTCCGTGA GTGCGAACTC
451 500
T3_edited ATCCATGGAA GGTGGGCCAT GCTCGCCACT CTCGGAGCTC TCACTGTTGA
T7_edited__Reve ATCCATGGAA GGTGGGCCAT GCTCGCCACT CTCGGAGCTC TCACTGTTGA
501 550
T3_edited GTGGCTCACT GGTGTTACAT GGCAAGACGC CGGAAAGGTG GAGCTAGTAG
T7_edited__Reve GTGGCTCACT GGTGTTACAT GGCAAGACGC CGGAAAGGTG GAGCTAGTAG
551 600
T3_edited AAGGGTCATC ATACCTTGGG CAACCACTTC CATTCTCAAT CACCACACTG
T7_edited__Reve AAGGGTCATC ATACCTTGGG CAACCACTTC CATTCTCAAT CACCACACTG
601 650
T3_edited ATCTGGATCG AGGTACTCGT AATTGGCTAC ATAGAGTTCC AGAGGAATGC
T7_edited__Reve ATCTGGATCG AGGTACTCGT AATTGGCTAC ATAGAGTTCC AGAGGAATGC
651 700
T3_edited AGAGCTCGAC CCAGAGAAGA GGTTGTACCC AGGTGGCAGT TACTTCGACC
T7_edited__Reve AGAGCTCGAC CCAGAGAAGA GGTTGTACCC AGGTGGCAGT TACTTCGACC
5
701 750
T3_edited CATTGGGCCT GGCCTCAGAC CCAGAGAAGA AAGCCACCCT TCAATTGGCG
T7_edited__Reve CATTGGGCCT GGCCTCAGAC CCAGAGAAGA AAGCCACCCT TCAATTGGCG
751 800
T3_edited GAGATCAAGC ACGCCCGTCT TGCCATGGTG GGCTTCTTGG GCTTTGCAGT
T7_edited__Reve GAGATCAAGC ACGCCCGTCT TGCCATGGTG GGCTTCTTGG GCTTTGCAGT
801 850
T3_edited CCAAGCCGCC GCCACCGGCA AGGGTCCGCT CAACAACTGG GCCACCCACT
T7_edited__Reve CCAAGCCGCC GCCACCGGCA AGGGTCCGCT CAACAACTGG GCCACCCACT
851 900
T3_edited TGAGTGACCC ACTCCACACA ACCATCATTG ACACCTTCTC ATCCTCCTCT
T7_edited__Reve TGAGTGACCC ACTCCACACA ACCATCATTG ACACCTTCTC ATCCTCCTCT
901 950
T3_edited TAAGAAGAAG AGTCTTTCTT GTGCCTCGTC ACTATTACTA GCATATTGTA
T7_edited__Reve TAAGAAGAAG AGTCTTTCTT GTGCCTCGTC ACTATTACTA GCATATTGTA
951 1000
T3_edited AAAGTCTTTT CTTCTTCGGC TTTTGGTTGT AATTAAAACA TTTTCACTTA
T7_edited__Reve AAAGTCTTTT CTTCTTCGGC TTTTGGTTGT AATTAGAACA TTTTCACTTA
1001 1050
T3_edited .......... .......... .......... .......... ..........
T7_edited__Reve GTTGGATTAG TAGTACTTGT GAAAAACTTG TAACGTGAAA TTGGAGAGGC
1051 1076
T3_edited .......... .......... ......
T7_edited__Reve TAAGCAGAAA TGCTTCTGCT TGATGT
76 at the end and 82 at the beginning are the only base pairs not overlapping, 152 total, indicating an overlap of 924
bp.
6
Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785415), mRNA
Sequence for BLAST input, entire T3 cDNA insert sequence –
TTCTTAAGAGGAGGATGAGAAGGTGTCAATGATGGTTGTGTGGAGTGGGTCACTCAAGTGGGTGGCCCAGTTGTTGAGCGGAC
CCTTGCCGGTGGCGGCGGCTTGGACTGCAAAGCCCAAGAAGCCCACCATGGCAAGACGGGCGTGCTTGATCTCCGCCAATTG
AAGGGTGGCTTTCTTCTCTGGGTCTGAGGCCAGGCCCAATGGGTCGAAGTAACTGCCACCTGGGTACAACCTCTTCTCTGGGTC
GAGCTCTGCATTCCTCTGGAACTCTATGTAGCCAATTACGAGTACCTCGATCCAGATCAGTGTGGTGATTGAGAATGGAAGTGG
TTGCCCAAGGTATGATGACCCTTCTACTAGCTCCACCTTTCCGGCGTCTTGCCATGTAACACCAGTGAGCCACTCAACAGTGAG
AGCTCCGAGAGTGGCGAGCATGGCCCACCTTCCATGGATGAGTTCGCACTCACGGAACCTCTGGAGCCCAAACACCTCGCTGT
AGGGCTGAAACGGCGTGGACTTCACGTCCGCGAGCTCGGTCCTGGTTCCAATGATGTCCCCAGCCACGTTCTTCGCAAGGTTCT
GGTCCAGCGAGTCCAGCTCGAACTGCAGGTACTCCGCGGGCTTCCCTAGCCCAAACGG
Top 20 BLAST hits1
–
1. Glycine max cDNA, clone: GMFL01-17-I17 (99% ID)
2. Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785415), mRNA (99% ID)
3. PREDICTED: Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785180), mRNA
(99% ID)
4. Glycine max cDNA, clone: GMFL01-03-O02 (96% ID)
5. Vigna radiata chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC106774260), mRNA (90% ID)
6. Phaseolus vulgaris hypothetical protein (PHAVU_010G002100g) mRNA, complete cds (90% ID)
7. Morus notabilis hypothetical protein partial mRNA (Too small)
8. Glycine max strain Williams 82 clone GM_WBb0012I20, complete sequence (Too large)
9. PREDICTED: Oryza brachyantha chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC102700159),
mRNA (83% ID)
10. Phyllostachys edulis chloroplast chlorophyll a/b binding protein cab-PhE7 mRNA, complete cds; nuclear gene
for chloroplast product (82%)
11. Pyrus x bretschneideri clone 915 a-b binding protein mRNA, complete cds (83% ID)
12. Pyrus x bretschneideri chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC103936594), mRNA (Too
small)
13. Phyllostachys edulis cDNA clone: bphyem211d01, full insert sequence (82% ID)
14. PREDICTED: Malus x domestica chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC103450616),
mRNA (83% ID)
15. PREDICTED: Oryza sativa Japonica Group chlorophyll a-b binding protein CP29.1, chloroplastic (LOC4343583),
mRNA (82% ID)
16. Oryza sativa (indica cultivar-group) cDNA clone:OSIGCSN014K09, full insert sequence (82% ID)
17. Oryza sativa Japonica Group cDNA clone:001-204-B02, full insert sequence (82% ID)
18. Oryza sativa Japonica Group cDNA clone:006-309-F11, full insert sequence (82% ID)
19. Oryza sativa Japonica Group cDNA clone:001-013-C12, full insert sequence (82% ID)
20. Oryza sativa Japonica Group cDNA clone:J013098H13, full insert sequence (82% ID)
Top 5 Protein BLAST hits1
–
1. chlorophyll a-b binding protein CP29.2, chloroplastic-like [Glycine max] (Length = 290) (100% ID)
2. hypothetical protein GLYMA_03G060300 [Glycine max] (Length = 290) (99% ID)
7
3. PREDICTED: chlorophyll a-b binding protein CP29.2, chloroplastic-like [Glycine max] (Length = 290) (97% ID)
4. PREDICTED: chlorophyll a-b binding protein CP29.3, chloroplastic-like [Glycine max] (Length = 278) (73% ID)
5. PREDICTED: chlorophyll a-b binding protein CP29.3, chloroplastic-like [Glycine max] (Length = 282) (72% ID)
Of the Light Harvesting Complex (LHC), two molecules which interact with visible light, chlorophyll
a and chlorophyll b (Chl-a and Chl-b respectively). The protein which is coded by the gene of interest is
chlorophyll a/b binding protein, which acts as an antenna for photosystems 1 and 2, a peripheral light-
harvesting protein.
ORF Analysis2
–
F S H T R R T L A Q T I M A T A T A A A
2 ttttctcacaccaggcgcacgttagcacaaactatcatggccacggcaacagcagctgcc 61
T S S F M G T R L L E A H S G A G R V H
62 acctcgtcctttatggggacgcgtctcctggaggctcactccggggcggggcgagtgcac 121
A R F G F G K K K A P A Q K K A S R G S
122 gcccgattcggcttcggcaagaaaaaggctcccgcccaaaagaaagcctccaggggatcg 181
G R D T V R P L W Y P G A K A P E Y L D
182 ggccgagacaccgtcagacccctttggtatccgggcgccaaagcgcccgaatacctcgat 241
G S L V G D Y G F D P F G L G K P A E Y
242 gggagtcttgtcggagactacgggttcgatccgtttgggctagggaagcccgcggagtac 301
L Q F E L D S L D Q N L A K N V A G D I
302 ctgcagttcgagctggactcgctggaccagaaccttgcgaagaacgtggctggggacatc 361
I G T R T E L A D V K S T P F Q P Y S E
362 attggaaccaggaccgagcttgcggacgtgaagtccacgccgtttcagccctacagcgag 421
V F G L Q R F R E C E L I H G R W A M L
422 gtgtttgggctccagaggttccgtgagtgcgaactcatccatggaaggtgggccatgctc 481
A T L G A L T V E W L T G V T W Q D A G
482 gccactctcggagctctcactgttgagtggctcactggtgttacatggcaagacgccgga 541
K V E L V E G S S Y L G Q P L P F S I T
542 aaggtggagctagtagaagggtcatcataccttgggcaaccacttccattctcaatcacc 601
T L I W I E A L V I G Y I E F Q R N A E
602 acactgatctggatcgaggcactcgtaattggctacatagagttccagaggaatgcagag 661
L D P E K R L Y P G G S Y F D P L G L A
662 ctcgacccagagaagaggttgtacccaggtggcagttacttcgacccattgggcctggcc 721
S D P E K K A T L Q L A E I K H A R L A
722 tcagacccagagaagaaagccacccttcaattggcggagatcaagcacgcccgtcttgcc 781
M V G F L G F A V Q A A V T G K G P L N
782 atggtgggcttcttgggctttgcagtccaagccgccgtcaccggcaagggcccgctcaac 841
N W A T H L S D P L H T T I I D T F S S
842 aactgggccacccacttgagtgacccactccacacaaccatcattgacaccttctcatcc 901
S S * E E E S F L C L V T I T S I L * K
902 tcctcttaagaagaagagtctttcttgtgcctcgtcactattactagcatattgtaaaag 961
S F L L R L L V V I R T F S L S W I S S
962 tcttttcttcttcggcttttggttgtaattagaacattttcacttagttggattagtagt 1021
T C E K L V T * N W R G * A E M L L L D
1022 acttgtgaaaaacttgtaacgtgaaattggagaggctaagcagaaatgcttctgcttgat 1081
V K C S P V N V Y Y A R N R E G M T R V
1082 gttaagtgttctcctgtaaatgtttattatgcacggaatcgagagggaatgacgagggta 1141
K K Y * * N * D H Y E R * W Q G I G L R
1142 aaaaaatactgataaaattgagatcactacgaaaggtaatggcagggaattggattgagg 1201
P K K K K K K K K
1202 ccaaaaaaaaaaaaaaaaaaaaaaaaaaa 1230
8
Start codon – Methionine highlighted in yellow. Stop codons are red asterisks (8 total), the first observed around the
beginning of line 902. Using SIXFRAME from the Biology Workbench, this frame had the longest ORF, and least stop
codons.
Protein Sequence1
–
MATATAAATSSFMGTRLLEAHSGAGRVHARFGFGKKKAPAQKKASRGSGRDTVRPLWYPGAKAPEYLDGSLVGDYGFDPFGLGKP
AEYLQFELDSLDQNLAKNVAGDIIGTRTELADVKSTPFQPYSEVFGLQRFRECELIHGRWAMLATLGALTVEWLTGVTWQDAGKVELV
EGSSYLGQPLPFSITTLIWIEALVIGYIEFQRNAELDPEKRLYPGGSYFDPLGLASDPEKKATLQLAEIKHARLAMVGFLGFAVQAAVTGK
GPLNNWATHLSDPLHTTIIDTFSSSS
 The first ATG (Methionine) is the start codon in the ORF analysis. I believe to have 100% of the cDNA.
 The cDNA insert has a LAMP motif,2
which codes for proteins responsible for interacting with light.
Most photosynthetic proteins are coded by DNA with a LAMP motif, as well as proteins in ocular
tissues.
 A hydropathy plot determines the hydrophobicity or hydrophilicity of a protein based on the amino acid
sequence.
9
The window size of the hydropathy plot has been set to 9.4
Above the line indicates regions of hydrophobicity, most
likely transmembrane or hydrophobic pockets within the protein structure. Most of the protein appears hydrophilic
however, and is most likely in an aqueous state.
10
References
(1) Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA
sequences", J Comput Biol 2000; 7(1-2):203-14.
(2) "SDSC Biology Workbench." SDSC Biology Workbench. Web. 14 Mar. 2016. <http://seqtool.sdsc.edu/>.
(3) "Software Developer of Next Gen Sequencing DNA Genetic Analysis and LIMS." Software Developer of Next Gen
Sequencing DNA Genetic Analysis and LIMS. Perkin Elmer, n.d. Web. 25 Feb. 2016.
<http://www.geospiza.com/>.
(4) "Genomics and Bioinformatics @ Davidson College." Genomics and Bioinformatics @ Davidson College. Web. 15
Mar. 2016. <http://gcat.davidson.edu/>.

More Related Content

Similar to Sequence Alignment of Glycine Max Cultures

20091110 Technical Seminar ChIP-seq Data Analysis
20091110 Technical Seminar  ChIP-seq Data Analysis20091110 Technical Seminar  ChIP-seq Data Analysis
20091110 Technical Seminar ChIP-seq Data AnalysisUniversitat Pompeu Fabra
 
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Aniket Bagul
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsTaha A. Taha
 
primer analysis.pdf
primer analysis.pdfprimer analysis.pdf
primer analysis.pdfAaimaAfzaal
 
some Dna sequences for bioinfomatics tools
some Dna sequences for bioinfomatics toolssome Dna sequences for bioinfomatics tools
some Dna sequences for bioinfomatics toolsFrazAhmadMazari
 
16 s rdna based microbial identification f
16 s rdna based microbial identification f16 s rdna based microbial identification f
16 s rdna based microbial identification fAman Kumar
 
Mutation Illistration
Mutation IllistrationMutation Illistration
Mutation IllistrationGroup32Hour
 
Mongo db and_academia
Mongo db and_academiaMongo db and_academia
Mongo db and_academiaSkills Matter
 
MongoDB and research
MongoDB and researchMongoDB and research
MongoDB and researchJan Aerts
 
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Borlaug Global Rust Initiative
 

Similar to Sequence Alignment of Glycine Max Cultures (16)

Allegato 2
Allegato 2Allegato 2
Allegato 2
 
20091110 Technical Seminar ChIP-seq Data Analysis
20091110 Technical Seminar  ChIP-seq Data Analysis20091110 Technical Seminar  ChIP-seq Data Analysis
20091110 Technical Seminar ChIP-seq Data Analysis
 
BiancoFlipbook
BiancoFlipbookBiancoFlipbook
BiancoFlipbook
 
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...Bioinformetics - genetic variation between haemoglobin protein of humans and ...
Bioinformetics - genetic variation between haemoglobin protein of humans and ...
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: Transcriptomics
 
primer analysis.pdf
primer analysis.pdfprimer analysis.pdf
primer analysis.pdf
 
Clustering Genes: W-curve + TSP
Clustering Genes: W-curve + TSPClustering Genes: W-curve + TSP
Clustering Genes: W-curve + TSP
 
some Dna sequences for bioinfomatics tools
some Dna sequences for bioinfomatics toolssome Dna sequences for bioinfomatics tools
some Dna sequences for bioinfomatics tools
 
Poster Pubblicazione
Poster PubblicazionePoster Pubblicazione
Poster Pubblicazione
 
16 s rdna based microbial identification f
16 s rdna based microbial identification f16 s rdna based microbial identification f
16 s rdna based microbial identification f
 
Mutation Illistration
Mutation IllistrationMutation Illistration
Mutation Illistration
 
Carbohydrate
CarbohydrateCarbohydrate
Carbohydrate
 
Mongo db and_academia
Mongo db and_academiaMongo db and_academia
Mongo db and_academia
 
MongoDB and research
MongoDB and researchMongoDB and research
MongoDB and research
 
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
Understanding the Genetic Landscape of Puccinia graminis f.sp. tritici From a...
 
In silico analysis for unknown data
In silico analysis for unknown dataIn silico analysis for unknown data
In silico analysis for unknown data
 

Sequence Alignment of Glycine Max Cultures

  • 1. 1 Sequence Analysis of an Isolated Culture, Pertaining to Glycine Max By: Alexander Ward CMB 426 - Dr. Tsou 3/15/2016
  • 2. 2 Original/Unedited Sequences3 – T3 Primer – GAACTCCGCGGGTGCGGCCGCTCTGAACTAGTGGATCCCCCGGGCTGCAGGAATTCGGCACGAGCGGCAACAGCAGCTGCCA CCTCGTCCTTTATGGGGACGCGTCTCCTGGAGGCTCACTCCGGAGCGGGGCGAGTGCACGCCCGATTCGGCTTCGGCAAGAAA AAGGCTCCCGCCCCAAAGAAAGCCTCCAGGGGATCGGGCCGAGACACCGACAGACCCCTTTGGTATCCGGGCGCCAAAGCGC CCGAATACCTCGATGGGAGTCTTGTCGGAGACTACGGGTTCGATCCGTTTGGGCTAGGGAAGCCCGCGGAGTACCTGCAGTTC GAGCTGGACTCGCTGGACCAGAACCTTGCGAAGAACGTGGCTGGGGACATCATTGGAACCAGGACCGAGCTCGCGGACGTGA AGTCCACGCCGTTTCAGCCCTACAGCGAGGTGTTTGGGCTCCAGAGGTTCCGTGAGTGCGAACTCATCCATGGAAGGTGGGCC ATGCTCGCCACTCTCGGAGCTCTCACTGTTGAGTGGCTCACTGGTGTTACATGGCAAGACGCCGGAAAGGTGGAGCTAGTAGA AGGGTCATCATACCTTGGGCAACCACTTCCATTCTCAATCACCACACTGATCTGGATCGAGGTACTCGTAATTGGCTACATAGA GTTCCAGAGGAATGCAGAGCTCGACCCAGAGAAGAGGTTGTACCCAGGTGGCAGTTACTTCGACCCATTGGGCCTGGCCTCAG ACCCAGAGAAGAAAGCCACCCTTCAATTGGCGGAGATCAAGCACGCCCGTCTTGCCATGGTGGGCTTCTTGGGCTTTGC AGTC CAAGCCGCCGCCACCGGCAAGGGTCCGCTCAACAACTGGGCCACCCACTTGAGTGACCCACTCCACACAACCATCATTGACAC CTTCTCATCCTCCTCTTAAGAAGAAGAGTCTTTCTTGTGCCTCGTCACTATTACTAGCATATTGTAAAAGTCTTTTCTTCTTCGGCT TTTGGTTGTAATTAAAACATTTTCACTTANTTGGAATNAGTAAATACTTGTGAAAAAACTTNGTAAACGTGGAAANTNGGANAG GCTAAACCAAAAATGGCTTTCTGCTTNAATGTTAAAAAAAAAAAAAAAAAAAAAAAATTCGAGGGGGGGNCCCGGGTAACCC NAATTNCCCCCNTAAAAGNGGNGTTCNAAATTAAAAANTTCACTGGGCCNGTTCGTTTTTNAAAACGTTCNTNAACGGGGGN AAAAACCCCTGGGGGGTTTACCCCAANTTTAAATTCCCCCTTTNGGAAANAAAATTCCCCCTTTTTNCCCNANGNTGGGGGNT NAAANAACNAAAAAANNGCCCCCCCCACCAAATTNNCCCNTTTTCCAAAAAANTTTNCCCAACCNTNAAAAGGGNNAAAN GGNAAAA T7 Primer – Slight Taq slip following poly-T CGGGCCCCCTGCGTTTTTTTTTTTTTTTTTTTTACATCAAGCAGAAGCATTTCTGCTTAGCCTCTCCAATTTCACGTTACAAGTTTTT CACAAGTACTACTAATCCAACTAAGTGAAAATGTTCTAATTACAACCAAAAGCCGAAGAAGAAAAGACTTTTACAATATGCTAG TAATAGTGACGAGGCACAAGAAAGACTCTTCTTCTTAAGAGGAGGATGAGAAGGTGTCAATGATGGTTGTGTGGAGTGGGTCA CTCAAGTGGGTGGCCCAGTTGTTGAGCGGACCCTTGCCGGTGGCGGCGGCTTGGACTGCAAAGCCCAAGAAGCCCACCATGG CAAGACGGGCGTGCTTGATCTCCGCCAATTGAAGGGTGGCTTTCTTCTCTGGGTCTGAGGCCAGGCCCAATGGGTCGAAGTAA CTGCCACCTGGGTACAACCTCTTCTCTGGGTCGAGCTCTGCATTCCTCTGGAACTCTATGTAGCCAATTACGAGTACCTCGATCC AGATCAGTGTGGTGATTGAGAATGGAAGTGGTTGCCCAAGGTATGATGAC CCTTCTACTAGCTCCACCTTTCCGGCGTCTTGCC ATGTAACACCAGTGAGCCACTCAACAGTGAGAGCTCCGAGAGTGGCGAGCATGGCCCACCTTCCATGGATGAGTTCGCACTCA CGGAACCTCTGGAGCCCAAACACCTCGCTGTAGGGCTGAAACGGCGTGGACTTCACGTCCGCGAGCTCGGTCCTGGTTCCAAT GATGTCCCCAGCCACGTTCTTCGCAAGGTTCTGGTCCAGCGAGTCCAGCTCGAACTGCAGGTACTCCGCGGGCTTCCCTAGCCC AAACGGGATCGAACCCGTAGTCTCCGACAAGACTCCCCATCGAGGTATTTCGGGGCGCTTTTGGCGCCCGGGATACCAAAGGG GGTCTGTCGGGTGGTCTCCGGCCCCGAATCCCCTGGGAGGGCTTTTCTTTTGGGGGGCGGGAAGCCTTTTTTTCTTGCCGGAAG CCCGAAAATCGGGGGCGGTGGNACTNCGCCCCCCGGNTTCNGGGANGTGAAGCCCTTCCCNAGGAANAAACNNNNGTTCC CCCCCNATAAAAANGGANCNAAANGGTGGGCNANCCCTTGCCTTTTTTGCCCCNCCTTCCGTNGNCCCAAAAATTCCNTGGC CANNCCCCCGGGGGGGGAAATCCCCCCTTAANNTTTNTTAAAAAAGCGGGGCCCCCCCCNCCCNCCCNGGGNGGGNAAAC CTTCCCNACCCNTTTTTTGNTTTCCCCCCNTTTTANNGGGGAN
  • 3. 3 Edited T3 Primer – GAATTCGGCACGAGCGGCAACAGCAGCTGCCACCTCGTCCTTTATGGGGACGCGTCTCCTGGAGGCTCACTCCGGAGCGGGG CGAGTGCACGCCCGATTCGGCTTCGGCAAGAAAAAGGCTCCCGCCCCAAAGAAAGCCTCCAGGGGATCGGGCCGAGACACC GACAGACCCCTTTGGTATCCGGGCGCCAAAGCGCCCGAATACCTCGATGGGAGTCTTGTCGGAGACTACGGGTTCGATCCGTT TGGGCTAGGGAAGCCCGCGGAGTACCTGCAGTTCGAGCTGGACTCGCTGGACCAGAACCTTGCGAAGAACGTGGCTGGGGAC ATCATTGGAACCAGGACCGAGCTCGCGGACGTGAAGTCCACGCCGTTTCAGCCCTACAGCGAGGTGTTTGGGCTCCAGAGGTT CCGTGAGTGCGAACTCATCCATGGAAGGTGGGCCATGCTCGCCACTCTCGGAGCTCTCACTGTTGAGTGGCTCACTGGTGTTAC ATGGCAAGACGCCGGAAAGGTGGAGCTAGTAGAAGGGTCATCATACCTTGGGCAACCACTTCCATTCTCAATCACCACACTGA TCTGGATCGAGGTACTCGTAATTGGCTACATAGAGTTCCAGAGGAATGCAGAGCTCGACCCAGAGAAGAGGTTGTACCCAGGT GGCAGTTACTTCGACCCATTGGGCCTGGCCTCAGACCCAGAGAAGAAAGCCACCCTTCAATTGGCGGAGATCAAGCACGCCCG TCTTGCCATGGTGGGCTTCTTGGGCTTTGCAGTCCAAGCCGCCGCCACCGGCAAGGGTCCGCTCAACAACTGGGCCACCCACTT GAGTGACCCACTCCACACAACCATCATTGACACCTTCTCATCCTCCTCTTAAGAAGAAGAGTCTTTCTTGTGCCTCGTCACTATTA CTAGCATATTGTAAAAGTCTTTTCTTCTTCGGCTTTTGGTTGTAATTAAAACATTTTCACTTA Edited T7 Primer – ACATCAAGCAGAAGCATTTCTGCTTAGCCTCTCCAATTTCACGTTACAAGTTTTTCACAAGTACTACTAATCCAACTAAGTGAAA ATGTTCTAATTACAACCAAAAGCCGAAGAAGAAAAGACTTTTACAATATGCTAGTAATAGTGACGAGGCACAAGAAAGACTCTT CTTCTTAAGAGGAGGATGAGAAGGTGTCAATGATGGTTGTGTGGAGTGGGTCACTCAAGTGGGTGGCCCAGTTGTTGAGCGGA CCCTTGCCGGTGGCGGCGGCTTGGACTGCAAAGCCCAAGAAGCCCACCATGGCAAGACGGGCGTGCTTGATCTCCGCCAATT GAAGGGTGGCTTTCTTCTCTGGGTCTGAGGCCAGGCCCAATGGGTCGAAGTAACTGCCACCTGGGTACAACCTCTTCTCTGGGT CGAGCTCTGCATTCCTCTGGAACTCTATGTAGCCAATTACGAGTACCTCGATCCAGATCAGTGTGGTGATTGAGAATGGAAGTG GTTGCCCAAGGTATGATGACCCTTCTACTAGCTCCACCTTTCCGGCGTCTTGCCATGTAACACCAGTGAGCCACTCAACAGTGA GAGCTCCGAGAGTGGCGAGCATGGCCCACCTTCCATGGATGAGTTCGCACTCACGGAACCTCTGGAGCCCAAACACCTCGCTG TAGGGCTGAAACGGCGTGGACTTCACGTCCGCGAGCTCGGTCCTGGTTCCAATGATGTCCCCAGCCACGTTCTTCGCAAGGTTC TGGTCCAGCGAGTCCAGCTCGAACTGCAGGTACTCCGCGGGCTTCCCTAGCCCAAACGGGATCGAACCCGTAGTCTCCGACAA GACTCCCCATCGAGGTATTTCGGGGCGCTTTTGGCGCCCGGGATACCAAAGGGGGTCTGTCGGGTGGTCTCCGGCCCCGAATC CCCTGGGAGGGCTTTTCTTTTGGGGGGCGGGAAGCCTTTTTTTCTTGCCGGAAGCCCGAAAATCGGGGGCGGTGG Green = EcoRI Cut Site, Red = XhoI Cut Site, Teal = PolyA Tail, Purple = Sequence discrepancies; Multiple cut sites were not observed.
  • 4. 4 Align 2 or more sequences using CLUSTALW2 – PILEMSF Format 1 50 T3_edited GAATTCGGCA CGAGCGGCAA CAGCAGCTGC CACCTCGTCC TTTATGGGGA T7_edited__Reve .......... .......... .......... .......... .......... 51 100 T3_edited CGCGTCTCCT GGAGGCTCAC TCCGGAGCGG GGCGAGTGCA CGCCCGATTC T7_edited__Reve .......... .......... .......... ..CCACCGCC CCCGATTTTC 101 150 T3_edited GG..CTTCGG CAAGAAAAAG G...CTCCCG CCCC...AAA GAAAG...CC T7_edited__Reve GGGCTTCCGG CAAGAAAAAA AGGCTTCCCG CCCCCCAAAA GAAAAGCCCT 151 200 T3_edited TCCAGGGGAT C..GGGCCG. AGACA..CCG ACAGACCCC. TTTGGTATCC T7_edited__Reve CCCAGGGGAT TCGGGGCCGG AGACCACCCG ACAGACCCCC TTTGGTATCC 201 250 T3_edited .GGGCGCCAA A.GCGCCC.. GAATACCTCG ATGGG.AGTC TTGTCGGAGA T7_edited__Reve CGGGCGCCAA AAGCGCCCCG AAATACCTCG ATGGGGAGTC TTGTCGGAGA 251 300 T3_edited CTACGGGTTC GATCC.GTTT GGGCTAGGGA AGCCCGCGGA GTACCTGCAG T7_edited__Reve CTACGGGTTC GATCCCGTTT GGGCTAGGGA AGCCCGCGGA GTACCTGCAG 301 350 T3_edited TTCGAGCTGG ACTCGCTGGA CCAGAACCTT GCGAAGAACG TGGCTGGGGA T7_edited__Reve TTCGAGCTGG ACTCGCTGGA CCAGAACCTT GCGAAGAACG TGGCTGGGGA 351 400 T3_edited CATCATTGGA ACCAGGACCG AGCTCGCGGA CGTGAAGTCC ACGCCGTTTC T7_edited__Reve CATCATTGGA ACCAGGACCG AGCTCGCGGA CGTGAAGTCC ACGCCGTTTC 401 450 T3_edited AGCCCTACAG CGAGGTGTTT GGGCTCCAGA GGTTCCGTGA GTGCGAACTC T7_edited__Reve AGCCCTACAG CGAGGTGTTT GGGCTCCAGA GGTTCCGTGA GTGCGAACTC 451 500 T3_edited ATCCATGGAA GGTGGGCCAT GCTCGCCACT CTCGGAGCTC TCACTGTTGA T7_edited__Reve ATCCATGGAA GGTGGGCCAT GCTCGCCACT CTCGGAGCTC TCACTGTTGA 501 550 T3_edited GTGGCTCACT GGTGTTACAT GGCAAGACGC CGGAAAGGTG GAGCTAGTAG T7_edited__Reve GTGGCTCACT GGTGTTACAT GGCAAGACGC CGGAAAGGTG GAGCTAGTAG 551 600 T3_edited AAGGGTCATC ATACCTTGGG CAACCACTTC CATTCTCAAT CACCACACTG T7_edited__Reve AAGGGTCATC ATACCTTGGG CAACCACTTC CATTCTCAAT CACCACACTG 601 650 T3_edited ATCTGGATCG AGGTACTCGT AATTGGCTAC ATAGAGTTCC AGAGGAATGC T7_edited__Reve ATCTGGATCG AGGTACTCGT AATTGGCTAC ATAGAGTTCC AGAGGAATGC 651 700 T3_edited AGAGCTCGAC CCAGAGAAGA GGTTGTACCC AGGTGGCAGT TACTTCGACC T7_edited__Reve AGAGCTCGAC CCAGAGAAGA GGTTGTACCC AGGTGGCAGT TACTTCGACC
  • 5. 5 701 750 T3_edited CATTGGGCCT GGCCTCAGAC CCAGAGAAGA AAGCCACCCT TCAATTGGCG T7_edited__Reve CATTGGGCCT GGCCTCAGAC CCAGAGAAGA AAGCCACCCT TCAATTGGCG 751 800 T3_edited GAGATCAAGC ACGCCCGTCT TGCCATGGTG GGCTTCTTGG GCTTTGCAGT T7_edited__Reve GAGATCAAGC ACGCCCGTCT TGCCATGGTG GGCTTCTTGG GCTTTGCAGT 801 850 T3_edited CCAAGCCGCC GCCACCGGCA AGGGTCCGCT CAACAACTGG GCCACCCACT T7_edited__Reve CCAAGCCGCC GCCACCGGCA AGGGTCCGCT CAACAACTGG GCCACCCACT 851 900 T3_edited TGAGTGACCC ACTCCACACA ACCATCATTG ACACCTTCTC ATCCTCCTCT T7_edited__Reve TGAGTGACCC ACTCCACACA ACCATCATTG ACACCTTCTC ATCCTCCTCT 901 950 T3_edited TAAGAAGAAG AGTCTTTCTT GTGCCTCGTC ACTATTACTA GCATATTGTA T7_edited__Reve TAAGAAGAAG AGTCTTTCTT GTGCCTCGTC ACTATTACTA GCATATTGTA 951 1000 T3_edited AAAGTCTTTT CTTCTTCGGC TTTTGGTTGT AATTAAAACA TTTTCACTTA T7_edited__Reve AAAGTCTTTT CTTCTTCGGC TTTTGGTTGT AATTAGAACA TTTTCACTTA 1001 1050 T3_edited .......... .......... .......... .......... .......... T7_edited__Reve GTTGGATTAG TAGTACTTGT GAAAAACTTG TAACGTGAAA TTGGAGAGGC 1051 1076 T3_edited .......... .......... ...... T7_edited__Reve TAAGCAGAAA TGCTTCTGCT TGATGT 76 at the end and 82 at the beginning are the only base pairs not overlapping, 152 total, indicating an overlap of 924 bp.
  • 6. 6 Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785415), mRNA Sequence for BLAST input, entire T3 cDNA insert sequence – TTCTTAAGAGGAGGATGAGAAGGTGTCAATGATGGTTGTGTGGAGTGGGTCACTCAAGTGGGTGGCCCAGTTGTTGAGCGGAC CCTTGCCGGTGGCGGCGGCTTGGACTGCAAAGCCCAAGAAGCCCACCATGGCAAGACGGGCGTGCTTGATCTCCGCCAATTG AAGGGTGGCTTTCTTCTCTGGGTCTGAGGCCAGGCCCAATGGGTCGAAGTAACTGCCACCTGGGTACAACCTCTTCTCTGGGTC GAGCTCTGCATTCCTCTGGAACTCTATGTAGCCAATTACGAGTACCTCGATCCAGATCAGTGTGGTGATTGAGAATGGAAGTGG TTGCCCAAGGTATGATGACCCTTCTACTAGCTCCACCTTTCCGGCGTCTTGCCATGTAACACCAGTGAGCCACTCAACAGTGAG AGCTCCGAGAGTGGCGAGCATGGCCCACCTTCCATGGATGAGTTCGCACTCACGGAACCTCTGGAGCCCAAACACCTCGCTGT AGGGCTGAAACGGCGTGGACTTCACGTCCGCGAGCTCGGTCCTGGTTCCAATGATGTCCCCAGCCACGTTCTTCGCAAGGTTCT GGTCCAGCGAGTCCAGCTCGAACTGCAGGTACTCCGCGGGCTTCCCTAGCCCAAACGG Top 20 BLAST hits1 – 1. Glycine max cDNA, clone: GMFL01-17-I17 (99% ID) 2. Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785415), mRNA (99% ID) 3. PREDICTED: Glycine max chlorophyll a-b binding protein CP29.2, chloroplastic-like (LOC100785180), mRNA (99% ID) 4. Glycine max cDNA, clone: GMFL01-03-O02 (96% ID) 5. Vigna radiata chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC106774260), mRNA (90% ID) 6. Phaseolus vulgaris hypothetical protein (PHAVU_010G002100g) mRNA, complete cds (90% ID) 7. Morus notabilis hypothetical protein partial mRNA (Too small) 8. Glycine max strain Williams 82 clone GM_WBb0012I20, complete sequence (Too large) 9. PREDICTED: Oryza brachyantha chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC102700159), mRNA (83% ID) 10. Phyllostachys edulis chloroplast chlorophyll a/b binding protein cab-PhE7 mRNA, complete cds; nuclear gene for chloroplast product (82%) 11. Pyrus x bretschneideri clone 915 a-b binding protein mRNA, complete cds (83% ID) 12. Pyrus x bretschneideri chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC103936594), mRNA (Too small) 13. Phyllostachys edulis cDNA clone: bphyem211d01, full insert sequence (82% ID) 14. PREDICTED: Malus x domestica chlorophyll a-b binding protein CP29.1, chloroplastic-like (LOC103450616), mRNA (83% ID) 15. PREDICTED: Oryza sativa Japonica Group chlorophyll a-b binding protein CP29.1, chloroplastic (LOC4343583), mRNA (82% ID) 16. Oryza sativa (indica cultivar-group) cDNA clone:OSIGCSN014K09, full insert sequence (82% ID) 17. Oryza sativa Japonica Group cDNA clone:001-204-B02, full insert sequence (82% ID) 18. Oryza sativa Japonica Group cDNA clone:006-309-F11, full insert sequence (82% ID) 19. Oryza sativa Japonica Group cDNA clone:001-013-C12, full insert sequence (82% ID) 20. Oryza sativa Japonica Group cDNA clone:J013098H13, full insert sequence (82% ID) Top 5 Protein BLAST hits1 – 1. chlorophyll a-b binding protein CP29.2, chloroplastic-like [Glycine max] (Length = 290) (100% ID) 2. hypothetical protein GLYMA_03G060300 [Glycine max] (Length = 290) (99% ID)
  • 7. 7 3. PREDICTED: chlorophyll a-b binding protein CP29.2, chloroplastic-like [Glycine max] (Length = 290) (97% ID) 4. PREDICTED: chlorophyll a-b binding protein CP29.3, chloroplastic-like [Glycine max] (Length = 278) (73% ID) 5. PREDICTED: chlorophyll a-b binding protein CP29.3, chloroplastic-like [Glycine max] (Length = 282) (72% ID) Of the Light Harvesting Complex (LHC), two molecules which interact with visible light, chlorophyll a and chlorophyll b (Chl-a and Chl-b respectively). The protein which is coded by the gene of interest is chlorophyll a/b binding protein, which acts as an antenna for photosystems 1 and 2, a peripheral light- harvesting protein. ORF Analysis2 – F S H T R R T L A Q T I M A T A T A A A 2 ttttctcacaccaggcgcacgttagcacaaactatcatggccacggcaacagcagctgcc 61 T S S F M G T R L L E A H S G A G R V H 62 acctcgtcctttatggggacgcgtctcctggaggctcactccggggcggggcgagtgcac 121 A R F G F G K K K A P A Q K K A S R G S 122 gcccgattcggcttcggcaagaaaaaggctcccgcccaaaagaaagcctccaggggatcg 181 G R D T V R P L W Y P G A K A P E Y L D 182 ggccgagacaccgtcagacccctttggtatccgggcgccaaagcgcccgaatacctcgat 241 G S L V G D Y G F D P F G L G K P A E Y 242 gggagtcttgtcggagactacgggttcgatccgtttgggctagggaagcccgcggagtac 301 L Q F E L D S L D Q N L A K N V A G D I 302 ctgcagttcgagctggactcgctggaccagaaccttgcgaagaacgtggctggggacatc 361 I G T R T E L A D V K S T P F Q P Y S E 362 attggaaccaggaccgagcttgcggacgtgaagtccacgccgtttcagccctacagcgag 421 V F G L Q R F R E C E L I H G R W A M L 422 gtgtttgggctccagaggttccgtgagtgcgaactcatccatggaaggtgggccatgctc 481 A T L G A L T V E W L T G V T W Q D A G 482 gccactctcggagctctcactgttgagtggctcactggtgttacatggcaagacgccgga 541 K V E L V E G S S Y L G Q P L P F S I T 542 aaggtggagctagtagaagggtcatcataccttgggcaaccacttccattctcaatcacc 601 T L I W I E A L V I G Y I E F Q R N A E 602 acactgatctggatcgaggcactcgtaattggctacatagagttccagaggaatgcagag 661 L D P E K R L Y P G G S Y F D P L G L A 662 ctcgacccagagaagaggttgtacccaggtggcagttacttcgacccattgggcctggcc 721 S D P E K K A T L Q L A E I K H A R L A 722 tcagacccagagaagaaagccacccttcaattggcggagatcaagcacgcccgtcttgcc 781 M V G F L G F A V Q A A V T G K G P L N 782 atggtgggcttcttgggctttgcagtccaagccgccgtcaccggcaagggcccgctcaac 841 N W A T H L S D P L H T T I I D T F S S 842 aactgggccacccacttgagtgacccactccacacaaccatcattgacaccttctcatcc 901 S S * E E E S F L C L V T I T S I L * K 902 tcctcttaagaagaagagtctttcttgtgcctcgtcactattactagcatattgtaaaag 961 S F L L R L L V V I R T F S L S W I S S 962 tcttttcttcttcggcttttggttgtaattagaacattttcacttagttggattagtagt 1021 T C E K L V T * N W R G * A E M L L L D 1022 acttgtgaaaaacttgtaacgtgaaattggagaggctaagcagaaatgcttctgcttgat 1081 V K C S P V N V Y Y A R N R E G M T R V 1082 gttaagtgttctcctgtaaatgtttattatgcacggaatcgagagggaatgacgagggta 1141 K K Y * * N * D H Y E R * W Q G I G L R 1142 aaaaaatactgataaaattgagatcactacgaaaggtaatggcagggaattggattgagg 1201 P K K K K K K K K 1202 ccaaaaaaaaaaaaaaaaaaaaaaaaaaa 1230
  • 8. 8 Start codon – Methionine highlighted in yellow. Stop codons are red asterisks (8 total), the first observed around the beginning of line 902. Using SIXFRAME from the Biology Workbench, this frame had the longest ORF, and least stop codons. Protein Sequence1 – MATATAAATSSFMGTRLLEAHSGAGRVHARFGFGKKKAPAQKKASRGSGRDTVRPLWYPGAKAPEYLDGSLVGDYGFDPFGLGKP AEYLQFELDSLDQNLAKNVAGDIIGTRTELADVKSTPFQPYSEVFGLQRFRECELIHGRWAMLATLGALTVEWLTGVTWQDAGKVELV EGSSYLGQPLPFSITTLIWIEALVIGYIEFQRNAELDPEKRLYPGGSYFDPLGLASDPEKKATLQLAEIKHARLAMVGFLGFAVQAAVTGK GPLNNWATHLSDPLHTTIIDTFSSSS  The first ATG (Methionine) is the start codon in the ORF analysis. I believe to have 100% of the cDNA.  The cDNA insert has a LAMP motif,2 which codes for proteins responsible for interacting with light. Most photosynthetic proteins are coded by DNA with a LAMP motif, as well as proteins in ocular tissues.  A hydropathy plot determines the hydrophobicity or hydrophilicity of a protein based on the amino acid sequence.
  • 9. 9 The window size of the hydropathy plot has been set to 9.4 Above the line indicates regions of hydrophobicity, most likely transmembrane or hydrophobic pockets within the protein structure. Most of the protein appears hydrophilic however, and is most likely in an aqueous state.
  • 10. 10 References (1) Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. (2) "SDSC Biology Workbench." SDSC Biology Workbench. Web. 14 Mar. 2016. <http://seqtool.sdsc.edu/>. (3) "Software Developer of Next Gen Sequencing DNA Genetic Analysis and LIMS." Software Developer of Next Gen Sequencing DNA Genetic Analysis and LIMS. Perkin Elmer, n.d. Web. 25 Feb. 2016. <http://www.geospiza.com/>. (4) "Genomics and Bioinformatics @ Davidson College." Genomics and Bioinformatics @ Davidson College. Web. 15 Mar. 2016. <http://gcat.davidson.edu/>.