step1.html 000664 022642 001631 00000003046 06505776233 013435 0 ustar 00val Yeasties 000000 000000
With the GeneFinder Tutorial database open....
Double click on the Test_Sequence class in the Classes window and then
double click the object EM:D13249 in the KeySet window. The features
map (fmap or sequence map) appears.
Now click on the GeneFind.. button at the top right of screen. A
message box appears; press the Continue button. GeneFinder features
appear; these are stop codons, ATG codons, blocks of coding potential,
splice donor and acceptor sites (see diagram below)
With the right hand mouse button click on the GeneFind.. button and
select [AutoFind gene] from the menu that appears. A message box
appears giving the maximum and minimum co-ordinates of the search, and
the score for the "temp_gene" which was found; press Continue. The
temp_gene is the GeneFinder prediction and is marked on the fmap.
Some GeneFinder features are now marked in green. These are the ATG
(AUG) translation initiation codon, splice donor and acceptor sites
and an open reading frame corresponding to the terminal exon of this
predicted gene. Note that the GeneFinder prediction does not
correspond to the gene from the EMBL record.
Some GeneFinder features are now marked in green. These are the ATG
(AUG) translation initiation codon, splice donor and acceptor sites
and an open reading frame corresponding to the terminal exon of this
predicted gene. Note that the GeneFinder prediction does not
correspond to the gene from the EMBL record.
The GeneFinder Tutorial database contains 38 Test_Sequences together
with regions of homology determined by BlastX, BlastN, TBlastX and
PSsearch. If you would like some practice at GeneFinding, have a
go at these and compare your results to the table below. User segments
of splice branch/acceptor motifs are available in the
GeneFinder_Tutorial.user_segments/ directory for each of
the sequences.
Of the 32 protein coding genes specified in the EMBL feature tables of
the Test_Sequences,
GeneFinder correctly predicts 24 (75%), with a single set of GeneFinder
parameters (see below). 39 introns from a total of 41 (95%) are correctly
predicted, with only 5 false positives. Two unannotated genes are
found within these sequences. All genes can be found by using
selection and antiselection of features.
29 of the protein coding genes have the
/evidence=EXPERIMENTAL qualifier in the EMBL feature table.
The table below gives the results of GeneFinding in these Test_Sequences
Gene symbol/Name access. spliced introns agrees? ================ ====== ======= ======= ======= PRH1 ATP-dependant RNA Helicase D13249 yes 2 2 + dsk1+ D13447 no + gamma glutamylcysteine synthase D55676 yes 1 0 - first exon missing; although evidence > blastx homology rpb6 L00597 yes 1 1 + cdc8 L04126 yes 1 1 - uses downstream ATG pold L07734 yes 1 1 +1 - splice AT^G to A > ATA , bug! exo2 L35232 yes 2 1 - uses rare donor gttgtt heat shock protein (sis1;Psi) L37753 no + another gene at 1897 2133 fus1 L37838 no +1 - intron internal to orf cdc27+ M74062 yes 5 5 + cdc27+ mRNA M83307 n/a + let1 U02280 yes 1 1 +1 - extra exon cki1 mRNA U06929 n/a + HIS1 mRNA U07830 n/a + HIS5 mRNA U07831 n/a + cnx1 U13389 n/a + ENO1 mRNA U13799 no + rpb1 X56564 yes 6 6 + rad9 X58231 yes 3 3 + SSP1 X59987 no + sts1+ X63549 no +1 - extra exon vma1+ X68580 yes 2 2 + vma2 X69638 yes 4 4 + FIB X69930 no + another gene at -3088 -1799 rad26 X76558 yes 2 2 + gar2 Z48166 no + sak1 U19978 no + IDI1 U21154 no + chk1 L13742 yes 6 6 + mcs2 S59895 yes 2 2 + csk1 S59896 yes 2 2 + rad1 M38132 no +1 - extra exon -------------------------------------------------------------------------------------------------------------------------------- 32 Total 24 Exact predictions 16 Spliced gene prediction 11 Exact spliced predictions introns 41 39 false +5 GeneFinder parameters Feature parameters Features range 10000 3' splice cutoff 2.00 5' splicecutoff 1.00 ATG cutoff 0.00 Autofind parameters min intron length 30 min exon length 3 intron cost -4.5 intron rate per log bp -2.0 coding:intron score ratio 1.00 GeneFinder_Tutorial.user_segments were included for each prediction using Read Segments
Correspondance to Sean Walsh svw@sanger.ac.uk
Bring up the GeneFinder menu (under the GeneFind.. button) and select
[Feature Parameters]. Change the parameters to the ones given below
and press the OK button.
Each Genefinder Feature is given a score. Click on an ATG, splice
donor (Splice5) or splice acceptor (Splice3) and look at the blue
information bar at the top of the screen to see the score. The Feature
parameters are the minimum score for a feature to be included on the
fmap.
The score of genefinder features is calculated from weight
matrices (ATG, donor and acceptor) and codon usage tables in the wgf/
directory of ACEDB
To recalculate the GeneFinder Features click on the GeneFind.. button
with the left mouse button OR bring up the GeneFinder menu with the
right mouse button and select [GeneFinder Features]. Note there are
fewer splice sites marked on the fmap.
The GeneFinder Features used for the last gene prediction are still
highlighted in green. Press the Clear button at the top of the screen,
otherwise GeneFinder will only use these features and subsequently
find the same gene.
Choose the [AutoFind Gene] option from the GeneFinder menu and press
Continue when the dialogue box appears. The old temp_gene is
overwritten with a new prediction (see below).
Choose the [AutoFind Gene] option from the GeneFinder menu and press
Continue when the dialogue box appears. Thestep3.html 000664 022642 001631 00000004624 06505776233 013442 0 ustar 00val Yeasties 000000 000000
Bring up the GeneFinder menu (under the GeneFind.. button) and select
[AutoFind Parameters]. Change the parameters to the ones given below
and press the OK button.
AutoFind parameters are weights and limits which are used by the
GeneFinder algorithm to predict a gene. AutoFind parameters can be
varied by the user, which allows GeneFinder to be tuned to find genes
typical of a particular organism; in this case Schizosaccharomyces
pombe
AutoFind parameters are:
[Aside: the additional cost is added above 100 bp (given a minimum
intron length of 30 bp). This is hard coded in ACEDB 4.1 (and has been
modified for this database) but will
become an AutoFind parameter in the next release. The additional cost
is the log of the additional length over 100bp multiplied by the
weight "intron rate per log bp".]
min intron length
min exon length does not apply to the first and last exon
intron cost the cost to the gene score for including each
intron
intron rate per log bp an additional cost for including an
intron above a certain length is added according to this weight (see Aside)
coding:intron score ratio the coding score is derived from
a codon usage table and visualised on the fmap as blocks of coding
potential. The coding score is increased proportionately by this
weight.
Press the Clear button to remove the green highlighting of GeneFinder
features and then select [AutoFind Gene] under the GeneFind.. menu.
Press Continue in the dialouge box which appears. The temp_gene
is overwritten. Note that the GeneFinder prediction exactly matches
the gene specified in the EMBL record. Double click the text
EM:D13249 to the left of the scale and inspect the full text of the
EMBL record. You will see that the intron/exon structure has been
confirmed experimentally.
Return to first page
ue in the dialouge box which appears. The temp_gene
is overwritten. Note that the GeneFinder prediction exacstep4.html 000664 022642 001631 00000002311 06505776233 013432 0 ustar 00val Yeasties 000000 000000
Show Selected
Double click the Test_Sequence EM:D55676 from the KeySet. With GeneFinder
Features and AutoFind Parameters set as in Steps 2 and 3, choose
[GeneFinder Features] and then [AutoFind gene].
The temp_gene does not have the same structure as the gene from the
EMBL record (see below).
Inspect the gene score of the temp_gene by chosing the [Show Selected]
option in the GeneFind.. menu. The following table should appear:
Now press Clear and click on the other gene. Choose [Gene -> Selected]
from the GeneFind.. menu and then [Show Selected]. The table below
should appear. The temp_gene was chosen in preferance to the other
gene due to its larger combined exon score, whilst combined intron scores
were approximately equivalent.
Return to first page
>
Now press Clear and click on the other gene. Choose [Gene -> Selected]
from the GeneFind.. menu and then [Show Selected]. The table below
should appear. The temp_gene was chosen in preferance to the other
gene due to its larger combined exon score, whilst combined intron scores
werstep5.html 000664 022642 001631 00000002561 06505776233 013442 0 ustar 00val Yeasties 000000 000000
Select and Antiselect
The exon/intron structure of the gene in the EM:D55676 EMBL record has
been experimentally determined. BlastX homology to another
gamma-glutamylcysteine synthetase from Rattus norvegicus
confirms this structure (click on the BlastX boxes to see this).
To predict this gene with GeneFinder it would be possible to modify
AutoFind Parameters until it was found. However, GeneFinder can be
forced to use or disregard selected features. To do this, press the
Clear button to remove all highlighting of features. Select the ATG
(base 684-686)(see image below) with the right mouse button and a menu
appears. Choose [Select] from the menu and the ATG is highlighted in
green. Now select the splice donor (base 1034-1035) with the right
mouse button and choose [Antiselect]. The donor is hightlighted in
pale green. From the GeneFind.. menu choose [AutoFind gene]. The gene
prediction is the same as that in the EMBL record.
Note that Unselect is also available from the feature selection menu.
Return to first page
select the splice donor (base 1034-1035) with the right
mouse button and choose [Antiselect]. The donor is hightlighted in
pale green. From thstep6.html 000664 022642 001631 00000003074 06505776234 013444 0 ustar 00val Yeasties 000000 000000
Read Segments
GeneFinder does not directly take into account splice branch sites when
predicting genes. However it is possible to give GeneFinder branch
site information by supplying splice acceptor sites which are a
combination of splice branch and acceptor. The combined
branch/acceptor motifs are located in the sequence by a program called
Sp3splice which is external to ACEDB, and the results are written to a
file with the ".useg" extension. Note that "useg" is a contraction of User
Segments. The following example is a demonstration of the value of
including splice branch information for predicting genes in S.
pombe
Bring up the fmap for Test_Sequence EM:X56564 and choose [GeneFinder
features] followed by [AutoFind gene] from the GeneFind.. menu. The
predicted gene is not the same as the gene from the EMBL record (see
below).
Select [GeneFinder parameters] from the GeneFind.. menu and change
3-splice cutoff to 2.00 (see below). Splice acceptors calculated by
GeneFinder do not attain scores over 2.00, so this ensures that they
are not used. Now select [GeneFinder features] from the GeneFind..
menu and notice that all the acceptors have disappeared from the fmap.
Return to first page
elect [GeneFinder parameters] from the GeneFind.. menu and change
3-splice cutoff to 2.00 (see below). Splice acceptors calculated by
GeneFinder do not attain scores over 2.00, so this ensures that they
are not used. Now select [GeneFinder features] from the GeneFind..
menu and notice that all the acceptors have disappeared from the fmap.