uk.ac.sanger.psu.gfmerge.analysis.simfeature_analysis.tools
Class FeatureOverlapAnalysisTools

java.lang.Object
  extended by uk.ac.sanger.psu.gfmerge.analysis.simfeature_analysis.tools.BasicFeatureAnalysisTools
      extended by uk.ac.sanger.psu.gfmerge.analysis.simfeature_analysis.tools.FeatureOverlapAnalysisTools

public class FeatureOverlapAnalysisTools
extends BasicFeatureAnalysisTools

class which contains basic overlap computing tools which are used by derived overlap analysis classes (cDNA, Blast)

Version:
1.0
Author:
Sebastian R. Spiegler

Constructor Summary
FeatureOverlapAnalysisTools()
           
 
Method Summary
private static long calcExonTotalOverlap(Exon exon, java.util.HashMap hashFeatOverlappedExons)
          method which calculates total cDNA overlap on genemodel exon level.
private static long calcGmCompTotalOverlap(FeatureComponentAble gmComp, java.util.HashMap hashFeatOverlappedGmComps, boolean isExon)
           
private static long calcGmLongestOverlap(int debug, boolean considerStrand, GeneModel gm, java.util.ArrayList overlappingFeatArr, boolean considerIntrons)
          method which calculates longest overlap for a single genemodel.
private static long calcGmTotalExIntOverlap(int debug, boolean considerStrand, GeneModel gm, java.util.ArrayList featArr, boolean considerIntrons)
           
private static long calcGmTotalOverlap(int debug, boolean considerStrand, GeneModel gm, java.util.ArrayList featArr)
          method which calculates total cDNA/single Blast protein overlap for a single genemodel.
private static java.util.HashMap calcHashOfLongestFeatOverlap(int debug, boolean considerStrand, java.util.HashMap hashFeatOverlappedGms, boolean considerIntrons)
          method which calculates hash of longest feature overlap.
private static java.util.HashMap calcHashOfScoredOverlap(java.util.HashMap gmHashOfTotalFeatOverlap, boolean isPercentage)
          method which calculates scored overlap of genemodels.
private static java.util.HashMap calcHashOfTotalOverlap(int debug, boolean considerStrand, java.util.HashMap gmHashOfOverlappedFeat, boolean considerIntrons)
          method which calculates hash of total cDNA overlap for genemodels.
private static long calcTotalLocationLength(org.biojava.bio.symbol.Location mergedSubLocs)
          method which accumulates total overlap length of exons/introns by aggregating sublocations of a compound overlap.
private static java.util.HashMap findLongestProteinOverlap(java.util.HashMap proteinOverlapLength)
          method which finds longest protein overlap.
private static java.util.HashMap getExonHashOfOverlappedFeat(int debug, boolean considerStrand, GeneModel gm, java.util.ArrayList overlappingFeatArr)
          method which returns hash of exons pointing to ArrayList of overlapping features.
private static java.util.ArrayList getFeatOverlapArrForExon(Exon exon, java.util.ArrayList featArr)
          method which returns overlapping features for a single exon.
private static java.util.ArrayList getFeatOverlapArrForGmComp(FeatureComponentAble gmComp, java.util.ArrayList featArr)
           
private static java.util.HashMap getGmCompHashOfOverlappedFeat(int debug, boolean considerStrand, java.util.ArrayList gmCompArr, java.util.ArrayList overlappingFeatArr)
          method which returns hash of gm features (exons or introns) pointing to ArrayList of overlapping features (Blast or cDNA).
static java.util.HashMap getGmHashOfAbsoluteFeatOverlap(int debug, boolean considerStrand, boolean isPercentage, java.util.ArrayList arrListOfGMRegions, java.util.ArrayList arrListOfFeatRegions, boolean considerIntrons)
          method which returns hash of genemodels pointing to their ABSOLUTE overlap (CDNA).
static java.util.HashMap getGmHashOfLongestAbsoluteFeatOverlap(int debug, boolean considerStrand, boolean isPercentage, java.util.ArrayList arrListOfGMRegions, java.util.ArrayList arrListOfFeatRegions, boolean considerIntrons)
          method which returns hash of gms pointing to their longest absolute overlap (BLAST).
private static java.util.HashMap getProteinHash(int debug, GeneModel gm, java.util.ArrayList overlappingFeatArr)
          method which returns protein hash.
private static java.lang.String getProteinID(GFMergeFeature feature)
          method which returns the protein ID of a feature.
private static java.util.HashMap getProteinOverlapHash(int debug, boolean considerStrand, GeneModel gm, java.util.HashMap proteinHash, boolean considerIntrons)
          method which returns hash map of proteins pointing to overlap length.
private static long getSingleProteinOverlap(int debug, boolean considerStrand, GeneModel gm, java.util.ArrayList arrFeat, boolean considerIntrons)
          method which returns single protein overlap for a genemodel.
private static java.util.HashMap putFeatureInProteinHash(java.lang.String proteinID, SimFeature feature, java.util.HashMap proteinHash)
          method which stores feature in protein hash.
 
Methods inherited from class uk.ac.sanger.psu.gfmerge.analysis.simfeature_analysis.tools.BasicFeatureAnalysisTools
getGmHashOfOverlappedFeat, isFeatureOverlappingGm
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FeatureOverlapAnalysisTools

public FeatureOverlapAnalysisTools()
Method Detail

getGmHashOfLongestAbsoluteFeatOverlap

public static java.util.HashMap getGmHashOfLongestAbsoluteFeatOverlap(int debug,
                                                                      boolean considerStrand,
                                                                      boolean isPercentage,
                                                                      java.util.ArrayList arrListOfGMRegions,
                                                                      java.util.ArrayList arrListOfFeatRegions,
                                                                      boolean considerIntrons)
method which returns hash of gms pointing to their longest absolute overlap (BLAST). Different hits of a single protein to a genemodel are considered and accumulated. Longest protein overlap is taken for scoring the genemodel.

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
arrListOfGMRegions - ArrayList of genemodel regions
arrListOfFeatRegions - ArrayList of SimFeature regions (Blast)
Returns:
hash of genemodels pointing to their blast overlap score

calcHashOfLongestFeatOverlap

private static java.util.HashMap calcHashOfLongestFeatOverlap(int debug,
                                                              boolean considerStrand,
                                                              java.util.HashMap hashFeatOverlappedGms,
                                                              boolean considerIntrons)
method which calculates hash of longest feature overlap. Considers multiple protein hits to a single genemodel and takes total length of the longest protein hit. (key=> gm, value=> longest overlap [long])

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
hashFeatOverlappedGms - hash of genemodels pointing to an ArrayList of overlapping features
Returns:
HashMap of genemodels pointing to their longest protein overlap

calcGmLongestOverlap

private static long calcGmLongestOverlap(int debug,
                                         boolean considerStrand,
                                         GeneModel gm,
                                         java.util.ArrayList overlappingFeatArr,
                                         boolean considerIntrons)
method which calculates longest overlap for a single genemodel.

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
gm - genemodel
overlappingFeatArr - ArrayList of overlapping features
Returns:
overlap length (longest)

findLongestProteinOverlap

private static java.util.HashMap findLongestProteinOverlap(java.util.HashMap proteinOverlapLength)
method which finds longest protein overlap. Returns a hash where protein ID is the key and the length is the value.

Parameters:
proteinOverlapLength - hash of proteins (protein ID) pointing to their overlap length
Returns:
hash, longest protein (protein ID) pointing to its overlap length

getProteinOverlapHash

private static java.util.HashMap getProteinOverlapHash(int debug,
                                                       boolean considerStrand,
                                                       GeneModel gm,
                                                       java.util.HashMap proteinHash,
                                                       boolean considerIntrons)
method which returns hash map of proteins pointing to overlap length. (all proteins are overlapping a single gm)

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
gm - genemodel
proteinHash - hash of proteins (protein ID) pointing to their overlapping feature hits
Returns:
hash map of proteins pointing to overlap length

getSingleProteinOverlap

private static long getSingleProteinOverlap(int debug,
                                            boolean considerStrand,
                                            GeneModel gm,
                                            java.util.ArrayList arrFeat,
                                            boolean considerIntrons)
method which returns single protein overlap for a genemodel.

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
gm - genemodel
arrFeat - ArrayList of different hits from the same protein
Returns:
overlap length

getProteinHash

private static java.util.HashMap getProteinHash(int debug,
                                                GeneModel gm,
                                                java.util.ArrayList overlappingFeatArr)
method which returns protein hash. Group feature hits from same protein. key=> protein ID, value=>ArrayList of overlapping features of this protein.

Parameters:
debug - debugging mode
gm - genemodel
overlappingFeatArr - ArrayList of genemodel overlapping features
Returns:
hash of proteins pointing to their overlapping hits

getProteinID

private static java.lang.String getProteinID(GFMergeFeature feature)
method which returns the protein ID of a feature.

Parameters:
feature - overlapping feature
Returns:
protein ID

putFeatureInProteinHash

private static java.util.HashMap putFeatureInProteinHash(java.lang.String proteinID,
                                                         SimFeature feature,
                                                         java.util.HashMap proteinHash)
method which stores feature in protein hash.

Parameters:
proteinID - ID of the protein
feature - overlapping SimFeature
proteinHash - protein hash
Returns:
protein hash (key=> protein ID, value=> ArrayList of hits from the this protein)

getGmHashOfAbsoluteFeatOverlap

public static java.util.HashMap getGmHashOfAbsoluteFeatOverlap(int debug,
                                                               boolean considerStrand,
                                                               boolean isPercentage,
                                                               java.util.ArrayList arrListOfGMRegions,
                                                               java.util.ArrayList arrListOfFeatRegions,
                                                               boolean considerIntrons)
method which returns hash of genemodels pointing to their ABSOLUTE overlap (CDNA).

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
arrListOfGMRegions - ArrayList of genemodel regions
arrListOfFeatRegions - ArrayList of feature regions (cDNA)
Returns:
hash of genemodels pointing to cDNA overlap

calcHashOfTotalOverlap

private static java.util.HashMap calcHashOfTotalOverlap(int debug,
                                                        boolean considerStrand,
                                                        java.util.HashMap gmHashOfOverlappedFeat,
                                                        boolean considerIntrons)
method which calculates hash of total cDNA overlap for genemodels.

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
gmHashOfOverlappedFeat - hash of genemodels pointing to ArrayList of overlapping cDNA features
Returns:
hash of genemodels pointing to their cDNA overlap

calcGmTotalOverlap

private static long calcGmTotalOverlap(int debug,
                                       boolean considerStrand,
                                       GeneModel gm,
                                       java.util.ArrayList featArr)
method which calculates total cDNA/single Blast protein overlap for a single genemodel.

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
gm - genemodel
featArr - ArrayList of genemodel overlapping cDNA features
Returns:
total cDNA overlap for a single genemodel

calcGmTotalExIntOverlap

private static long calcGmTotalExIntOverlap(int debug,
                                            boolean considerStrand,
                                            GeneModel gm,
                                            java.util.ArrayList featArr,
                                            boolean considerIntrons)

calcExonTotalOverlap

private static long calcExonTotalOverlap(Exon exon,
                                         java.util.HashMap hashFeatOverlappedExons)
method which calculates total cDNA overlap on genemodel exon level.

Parameters:
exon - Exon object
hashFeatOverlappedExons - hash of exons pointing to their overlapping cDNA / Blast features
Returns:
total cDNA / Blast overlap at genemodel nucleotide level

calcGmCompTotalOverlap

private static long calcGmCompTotalOverlap(FeatureComponentAble gmComp,
                                           java.util.HashMap hashFeatOverlappedGmComps,
                                           boolean isExon)

calcTotalLocationLength

private static long calcTotalLocationLength(org.biojava.bio.symbol.Location mergedSubLocs)
method which accumulates total overlap length of exons/introns by aggregating sublocations of a compound overlap.

Parameters:
mergedSubLocs - merged Locations of all htis at exon level
Returns:
total overlap lenght of genemodel on exon level

calcHashOfScoredOverlap

private static java.util.HashMap calcHashOfScoredOverlap(java.util.HashMap gmHashOfTotalFeatOverlap,
                                                         boolean isPercentage)
method which calculates scored overlap of genemodels. (returns hash of gms pointing to their [Double] scored overlap).

Parameters:
gmHashOfTotalFeatOverlap - hash genemodels pointing to their absolute overlap [Long]
isPercentage - switch for percentage overlap otherwise absolute overlap [Double]
Returns:
returns hash genemodels pointint to their percentage feature overlap

getExonHashOfOverlappedFeat

private static java.util.HashMap getExonHashOfOverlappedFeat(int debug,
                                                             boolean considerStrand,
                                                             GeneModel gm,
                                                             java.util.ArrayList overlappingFeatArr)
method which returns hash of exons pointing to ArrayList of overlapping features.

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
gm - genemodel
overlappingFeatArr - ArrayList of genemodel overlapping features
Returns:
hash of exons pointing to ArrayList of overlapping features

getGmCompHashOfOverlappedFeat

private static java.util.HashMap getGmCompHashOfOverlappedFeat(int debug,
                                                               boolean considerStrand,
                                                               java.util.ArrayList gmCompArr,
                                                               java.util.ArrayList overlappingFeatArr)
method which returns hash of gm features (exons or introns) pointing to ArrayList of overlapping features (Blast or cDNA).

Parameters:
debug - debugging mode
considerStrand - switch for strand consideration
gmCompArr - ArrayList of genemodel components (exons or introns)
overlappingFeatArr - ArrayList of genemodel overlapping features
Returns:
hash of exons pointing to ArrayList of overlapping features

getFeatOverlapArrForExon

private static java.util.ArrayList getFeatOverlapArrForExon(Exon exon,
                                                            java.util.ArrayList featArr)
method which returns overlapping features for a single exon.

Parameters:
exon - Exon object
featArr - ArrayList of genemodel overlapping features
Returns:
ArrayList of overlapping features for a single exon

getFeatOverlapArrForGmComp

private static java.util.ArrayList getFeatOverlapArrForGmComp(FeatureComponentAble gmComp,
                                                              java.util.ArrayList featArr)