uk.ac.sanger.psu.gfmerge.util
Class PredAccuracy

java.lang.Object
  extended by uk.ac.sanger.psu.gfmerge.util.PredAccuracy

public class PredAccuracy
extends java.lang.Object

class PredAccuracy wraps the calculation of the average conditional probability for a genepredicter program.


Field Summary
private  double acp
          Average Conditional Probability (ACP) is suggested as an appropriate measure of global prediction accuracy (Burset and Guigo, 1996).
private  int fn
          attribute records the number of false negative bases (bases which are predicted to be non-coding but are coding in the annotation).
private  int fp
          attribute records the number of false positive bases (bases which are predicted to be coding but are non-coding in the annotation).
private  int tn
          attribute records the number of true negative bases (bases which are predicted to be non-coding and are non-coding in the annotation).
private  int tp
          attribute records the number of true positive bases (bases which are predicted to be coding and are coding in the annotation).
 
Constructor Summary
PredAccuracy(org.biojava.bio.seq.Sequence seq_org, org.biojava.bio.seq.Sequence seq_comp)
          constructor which creates PredAccuracy object.
 
Method Summary
private  double calcACP(org.biojava.bio.seq.Sequence seq_org, org.biojava.bio.seq.Sequence seq_comp)
          method which calculates ACP (average conditional probability).
private  void compSeq(int seqLength_org, java.util.ArrayList arrList_org_fw, java.util.ArrayList arrList_org_rev, java.util.ArrayList arrList_comp_fw, java.util.ArrayList arrList_comp_rev)
          method which compares two sequences for matches and overlaps.
private static java.util.HashMap countNucleotides(java.util.HashMap thisCountMap, int seqLength, java.lang.StringBuffer seq_org, java.lang.StringBuffer seq_comp)
          method which counts TP,TN,FP,FN bases.
 double getACP()
          accessor method which returns average conditional probability value.
private static java.util.ArrayList getAListOfLocs(org.biojava.bio.seq.FeatureHolder fh)
          method which returns ArrayList of locations when sequence is given.
private static org.biojava.bio.seq.FeatureHolder getFeaturesFromStrand(boolean pos_strand, org.biojava.bio.seq.FeatureHolder fh)
          method which returns featureholder of all features on either forward or reverse strand
 int getFN()
          accessor method which returns number of false negative bases.
 int getFP()
          accessor method which gets number of false positive bases.
 int getTN()
          accessor method which returns number of true negative bases.
 int getTP()
          accessor method which returns number of true positive bases.
private static java.lang.StringBuffer markCoding(java.util.ArrayList arrListLoc, java.lang.StringBuffer seq)
          method which marks coding exons as 'C' on a sequence.
private  void setACP(double _acp)
          accessor method which sets average conditional probability value.
private  void setAttributes(java.util.HashMap _countMap)
          method which saves HashMap containing the tp, tn, fp, fn values to their appropriate attributes.
private  void setFN(int _fn)
          accessor method which sets number of false negative bases.
private  void setFP(int _fp)
          accessor method which sets number of false positive bases.
private  void setTN(int _tn)
          accessor method which sets number of true negative bases.
private  void setTP(int _tp)
          accessor method which setsnumber of true positive bases.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tp

private int tp
attribute records the number of true positive bases (bases which are predicted to be coding and are coding in the annotation).


tn

private int tn
attribute records the number of true negative bases (bases which are predicted to be non-coding and are non-coding in the annotation).


fp

private int fp
attribute records the number of false positive bases (bases which are predicted to be coding but are non-coding in the annotation).


fn

private int fn
attribute records the number of false negative bases (bases which are predicted to be non-coding but are coding in the annotation).


acp

private double acp
Average Conditional Probability (ACP) is suggested as an appropriate measure of global prediction accuracy (Burset and Guigo, 1996).

It is calculated by comparing the prediction to the annotation on the base level. Formula used: ACP = [TP/(TP+FN)+TP/(TP+FP)+TN/(TN+FP)+TN/(TN+FN)]/4

Constructor Detail

PredAccuracy

public PredAccuracy(org.biojava.bio.seq.Sequence seq_org,
                    org.biojava.bio.seq.Sequence seq_comp)
constructor which creates PredAccuracy object. Calls calcACP method which will calculate the average conditional probability.

Parameters:
seq_org - BioJava sequence object containing the annotation
seq_comp - BioJava sequence object containing the prediction
Method Detail

getTP

public int getTP()
accessor method which returns number of true positive bases.

Returns:
number of true positive bases

setTP

private void setTP(int _tp)
accessor method which setsnumber of true positive bases.

Parameters:
_tp - number of true positive bases

getTN

public int getTN()
accessor method which returns number of true negative bases.

Returns:
number of true negative bases

setTN

private void setTN(int _tn)
accessor method which sets number of true negative bases.

Parameters:
_tn - number of true negative bases

getFP

public int getFP()
accessor method which gets number of false positive bases.

Returns:
number of false positive bases

setFP

private void setFP(int _fp)
accessor method which sets number of false positive bases.

Parameters:
_fp - number of false positive bases

getFN

public int getFN()
accessor method which returns number of false negative bases.

Returns:
number of false negative bases

setFN

private void setFN(int _fn)
accessor method which sets number of false negative bases.

Parameters:
_fn - accessor method which returns number of false negative bases.

getACP

public double getACP()
accessor method which returns average conditional probability value.

Returns:
average conditional probability

setACP

private void setACP(double _acp)
accessor method which sets average conditional probability value.

Parameters:
_acp - average conditional probability

calcACP

private double calcACP(org.biojava.bio.seq.Sequence seq_org,
                       org.biojava.bio.seq.Sequence seq_comp)
method which calculates ACP (average conditional probability). called from constructor.

Parameters:
seq_org - annotation sequence
seq_comp - annotation sequence
Returns:
average conditional probability

getAListOfLocs

private static java.util.ArrayList getAListOfLocs(org.biojava.bio.seq.FeatureHolder fh)
method which returns ArrayList of locations when sequence is given. Called from calcACP

Parameters:
fh - BioJava FeatureHolder
Returns:
ArrayList of Locations

compSeq

private void compSeq(int seqLength_org,
                     java.util.ArrayList arrList_org_fw,
                     java.util.ArrayList arrList_org_rev,
                     java.util.ArrayList arrList_comp_fw,
                     java.util.ArrayList arrList_comp_rev)
method which compares two sequences for matches and overlaps. Considers forward and reverse strand seperately.

Parameters:
seqLength_org - length of the sequence
arrList_org_fw - ArrayList of bases (either c for coding or n for non-coding) of annotaion, forward strand
arrList_org_rev - ArrayList of bases (either c for coding or n for non-coding) of annotaion, reverse strand
arrList_comp_fw - ArrayList of bases (either c for coding or n for non-coding) of prediction, forward strand
arrList_comp_rev - ArrayList of bases (either c for coding or n for non-coding) of prediction, reverse strand

markCoding

private static java.lang.StringBuffer markCoding(java.util.ArrayList arrListLoc,
                                                 java.lang.StringBuffer seq)
method which marks coding exons as 'C' on a sequence. Called from compSeq

Parameters:
arrListLoc - ArrayList of coding Locations
seq - String which contains binary variable for each position, at the beginning all postions are non-coding, during processing coding bases will be marked "C"
Returns:
String containg coding/non-coding information for each base

countNucleotides

private static java.util.HashMap countNucleotides(java.util.HashMap thisCountMap,
                                                  int seqLength,
                                                  java.lang.StringBuffer seq_org,
                                                  java.lang.StringBuffer seq_comp)
method which counts TP,TN,FP,FN bases. Called from compSeq

Parameters:
thisCountMap - HashMap containing tp, tn, fp, fn as keys pointing to their value, number of bases of the other strand will be added
seqLength - length of sequence
seq_org - annotation
seq_comp - prediction
Returns:
HashMap containing tp, tn, fp, fn pointing to their values

getFeaturesFromStrand

private static org.biojava.bio.seq.FeatureHolder getFeaturesFromStrand(boolean pos_strand,
                                                                       org.biojava.bio.seq.FeatureHolder fh)
method which returns featureholder of all features on either forward or reverse strand

Parameters:
pos_strand - switch for strand
fh - BioJava FeatuerHolder
Returns:
featureholder of all features on either forward or reverse strand

setAttributes

private void setAttributes(java.util.HashMap _countMap)
method which saves HashMap containing the tp, tn, fp, fn values to their appropriate attributes.

Parameters:
_countMap - HashMap containing the tp, tn, fp, fn values