Due: Thursday 4/10 10:00am
Submission name: w15_proteins
Background Genetics Information
DNA is made up of strands of nucleotides, of which there are 4 types: adenine, thymine, cytosine and guanine. Because of this, DNA sequences can be represented as strings like this:
tcgcagctcgaaccactatg
Generally, DNA is translated into RNA, and then RNA is used to create proteins using amino acids. In DNA, 3 nucleotides together represent a single amino acid. We refer to sequences of 3 nucleotides as codons. Additionally, the sequence atg represents the start of a protein, and taa, tga, and tag represent the end of a protein. We will be working on a processing program to help visualize different information about DNA strands.
Useful Java Stuff
- If you want to look at a working version of the previous assignment, check
thesource
. indexOf(String sub)
is a JavaSting
method that returns either:- The index of the start of the first occurrence of
sub
in the calling string. -1
ifsub
does not appear in the calling string.
- The index of the start of the first occurrence of
substring()
is a JavaString
method that returns a portion of aString
, as a newString
object. There are 2 versions ofsubstring()
:s.substring(start)
- Returns a
String
made from the characters ins
starting at indexstart
and going to the end ofs
.
- Returns a
s.substring(start, end)
- Returns a
String
made from the characters ins
starting at indexstart
and ending at indexend-1
.
- Returns a
Task at Hand
Create a copy of your work from yesterday, then add the following methods:
intFindProteinEnd(String strand)
- Returns the index of the first end codon in
strand
. - Returns
-1
if there is no end codon instrand
.
- Returns the index of the first end codon in
boolean containsProtein(String dna)
- Returns true if
dna
contains at least one full exon. - For our purposes, a DNA sequence contains an exon if:
- It has a start codon
- It has an end codon
- The number of nucleotides between the start and end is a multiple of 3 (i.e. there are no nucleotides unattached to a codon)
- It has at least 5 other codons between those 2. (this is not biologically accurate, in reality this is closer to 430 codons).
- Returns true if
String getProtein(String dna)
- Returns the first protein-encoding (exon) portion of
dna
. It should not include the start or end codons. - If there are no exons in
dna
, return the empyt string.
- Returns the first protein-encoding (exon) portion of
Here are a series of useful test cases for this assignment:
println("protein end in [" + protein1 + "] (21): " + findProteinEnd(protein1));
println("protein end in [" + noProtein0 + "] (-1): " + findProteinEnd(noProtein0));
println("protein end in [" + noProtein2 + "] (3): " + findProteinEnd(noProtein2));
println("protein in [" + protein0 + "] (true): " + containsProtein(protein0));
println("protein in [" + protein1 + "] (true): " + containsProtein(protein1));
println("protein in [" + protein2 + "] (true): " + containsProtein(protein2));
println("protein in [" + noProtein0 + "] (false): " + containsProtein(noProtein0));
println("protein in [" + noProtein1 + "] (false): " + containsProtein(noProtein1));
println("protein in [" + noProtein2 + "] (false): " + containsProtein(noProtein2));
println("protein in [" + noProtein3 + "] (false): " + containsProtein(noProtein3));
println("protein in [" + noProtein4 + "] (false): " + containsProtein(noProtein4));
println();
println("protein in [" + protein0 + "] " + getProtein(protein0));