Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cancer genomics big_datascience_meetup_july_14_2014


Published on

Java and Scala coding for Cancer Genomics

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Cancer genomics big_datascience_meetup_july_14_2014

  1. 1. Java and Scala for Cancer Genomics By Ayush Sarkar Irvington High School July 14, 2014 1
  2. 2. 2 Comparing Reference Genome and Subject Genome R S There are gaps and mutations. Analysis should be done to identify gaps and mutations.
  3. 3. 3 First we go through an Example Java Program Creation and Execution Next we looking into Java code in BioJava (open source project for Bioinformatics) BioJava Location:
  4. 4. 4 An Example Java Program to Check Validity and to Compare Two Strings • In this example we examine the SerialNumber class, which is used by the Home Software Company to validate software serial numbers. A valid software serial number is in the form LLLLL- DDDD-LLLL , where L indicates an alphabetic letter and D indicates a numeric digit. For example, WRXTQ-7786-PGVZ is a valid serial number. Notice that a serial number consists of three groups of characters, delimited by hyphens. • After checking the validity, a serial number assigned to a customer will be compared to a serial number stored in database to check for equality. • This example shows steps similar to DNA sequence alignment.
  5. 5. 5 The fields first, second, and third are used to hold the first, second, and third groups of characters in a serial number. The valid field is set to true by the constructor to indicate a valid serial number, or false to indicate an invalid serial number. SerialNumber Class Definition Class Instance Constructor General Methods Internal Variables (Instance Variable)
  6. 6. 6 Method Description for the Class • Constructor: The constructor accepts a string argument that contains a serial number. The string is tokenized and its tokens are stored in the first , second , and third fields. The validate method is called. • isValid: This method returns the value in the valid field. • Validate: This method calls the isFirstGroupValid , isSecondGroupValid , and isThirdGroupValid methods to validate the first , second , and third • fields. • isThirdGroupValid: Methods to validate the first , second , and third fields. • isFirstGroupValid: This method returns true if the value stored in the first field is valid. Otherwise, it returns false . • isSecondGroupValid: This method returns true if the value stored in the second field is valid. Otherwise, it returns false . • isThirdGroupValid: This method returns true if the value stored in the third field is valid. Otherwise, it returns false . • EqualityTest: This method is called to check if serial number is equal to a serial number in the database.
  7. 7. 7 SerialNumber Class (without main method) and SerialNumberTester Class (with main method) Use Eclipse to create file: import java.util.StringTokenizer; public class SerialNumber { …..... …….. } Use Eclipse to create under same project: public class SerialNumberTester { public static void main(String[] args) { …… …… } } See details of the classes on Eclipse Integrated Development Environment and Execute them.
  8. 8. 8 Using Eclipse to Create, Execute and Debug Java and Scala programs 1. Get Eclipse from: 2. Unzip and Install on your Laptop; 3. Install Java 1.6 or 1.7 version; 4. Create a Project (from File tab at the top) in Eclipse; 5. Create a Java Class under the project; 6. Define methods and variables for the class; 7. Import necessary packages; 8. Compile and Execute the class created; 9. Try to debug using debug windows and commands; 10. It is easy !!!
  9. 9. 9 BioJava BioJava is an open-source project dedicated to providing a Java framework for processing biological data. It includes objects for manipulating biological sequences, file parsers, access to BioSQL and Ensembl databases, tools for making sequence analysis GUIs and powerful analysis and statistical routines including a dynamic programming toolkit. BioJava takes part in Google Summer of Code as part of the OBF - the Open Bioinformatics Foundation. Please visit:
  10. 10. 10 BioJava The core sequence classes: • AbstractSequence • DNASequence • ChromosomeSequence • GeneSequence • IntronSequence • ExonSequence • TranscriptSequence • RNASequence • ProteinSequence By using the Sequence Interface one can easily extend the concept of local sequence storage in a fasta (sequence file format) file to loading the sequence from Uniprot (Protein database over the internet) or NCBI (Genome database over the internet) based on an accession ID. ProteinSequence proteinSequence = new ProteinSequence("ARNDCEQGHILKMFPSTWYVBZJX"); DNASequence dnaSequence = new DNASequence("ATCG"); UniprotProxySequenceReader<AminoAcidCompound> uniprotSequence = new UniprotProxySequenceReader<AminoAcidCompound>("YA745_GIBZE", AminoAcidCompoundSet.getAminoAcidCompoundSet()); ProteinSequence proteinSequence = new ProteinSequence(uniprotSequence);
  11. 11. 11 BioJava DNA translation follows the normal biological flow where a portion of DNA (assumed to be CDS) is translated to mRNA. This is translated to a protein sequence using codons. ProteinSequence protein = new DNASequence("ATG").getRNASequence().getProteinSequence(); The BioJava sequence I/O code is designed to be flexible and easy to adapt for a wide variety of purposes. All methods take a Java BufferedReader object, and return an iterator which allows you to scan through the sequences in a file. For example: BufferedReader br = new BufferedReader( new FileReader(fileName) ); SequenceIterator stream = SeqIOTools.readFastaDNA(br); while (stream.hasNext()) { Sequence seq = stream.nextSequence(); / // do something with the sequence. }
  12. 12. 12 Java: List<Integer> iList = Arrays.asList(2, 7, 9, 8, 10); List<Integer> iDoubled = new ArrayList<Integer>(); for(Integer number: iList){ if(number % 2 == 0){ iDoubled.add(number 2); } } Scala: val iList = List(2, 7, 9, 8, 10); val iDoubled = iList.filter(_ % 2 == 0).map(_ 2) Scala: object HelloWorld { def main(args: Array[String]){ println("Hello, world!") } } Java: public class HelloWorldApp { public static void main(String[] args) { System.out.println("Hello World!"); } } Scala Vs. Java -- Scala runs on Java Virtual Machine
  13. 13. 13 public class PSA_DNA { public static void main(String[] args){ String targetSeq = "CACGTTTCTTGTGGCAGCTTAAGTTT" ; DNASequence target = new DNASequence(targetSeq, AmbiguityDNACompoundSet.getDNACompoundSet()); String querySeq = "ACGAGTGCGTGTTTTCCCGCCTGGTC"; DNASequence query = new DNASequence(querySeq, AmbiguityDNACompoundSet.getDNACompoundSet()); SubstitutionMatrix<NucleotideCompound> matrix = SubstitutionMatrixHelper.getNuc4_4(); SimpleGapPenalty gapP = new SimpleGapPenalty(); gapP.setOpenPenalty((short)5); gapP.setExtensionPenalty((short)2); SequencePair<DNASequence, NucleotideCompound> psa = Alignments.getPairwiseAlignment(query, target, PairwiseSequenceAlignerType.LOCAL, gapP, matrix); System.out.println(psa); } } Calculating a local Alignment -- Java code using Java packages Variable Definitions Method Calls Class Definition Variable Definitions with Method Call
  14. 14. 14 import org.biojava3.alignment.{Alignments, SimpleGapPenalty, SubstitutionMatrixHelper} import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType.LOCAL Import org.biojava3.core.sequence.DNASequence import org.biojava3.core.sequence.compound.AmbiguityDNACompoundSet object PSA_DNA { implicit def str2DNA(seq: String) = new DNASequence(seq,AmbiguityDNACompoundSet.getDNACompoundSet) def main(args: Array[String]) { // Note implicit cast from strings to DNASequence val target: DNASequence = "CACGTTTCTTGTGGCAGCTTAAGTTTGAAT" val query: DNASequence = "ACGAGTGCGTGTTTTCCCGCCTGGTCCCCA" val matrix = SubstitutionMatrixHelper.getNuc4_4() val gapP = new SimpleGapPenalty() gapP.setOpenPenalty(5) gapP.setExtensionPenalty(2) val psa = Alignments.getPairwiseAlignment(query, target, LOCAL, gapP, matrix) println(psa) } } Calculating a local Alignment -- Scala code using Java packages Java Packages Implicit Method Variable Definitions Method Calls Variable Definitions with Method Call
  15. 15. 15 Thank You! E-mail: Watch Java and Scala “Hello World” program execution on Eclipse !