GFF is a format for describing genes and other features associated
with DNA, RNA and Protein sequences. The current specification can
be found here. The current version
level of GFF is Version 2.
This page is a starting-point for finding out about this format and
its use in bioinformatics. In particular, since its proposal a
considerable amount of software has been developed for use with GFF
and this page is intended as a focus for the collation of this
software, whether developed in the Sanger Institute or elsewhere.
A GFF record is an extension of a basic (name,start,end)
tuple (or "NSE") that can be used to identify a substring of a
biological sequence. (For example, the NSE
(ChromosomeI,2000,3000) specifies the third kilobase of the
sequence named "ChromosomeI".) GFF allows for moderately verbose
annotation of single NSEs. It also provides limited support for NSE
pairs in a rather asymmetrical way. An alternative format for
representing NSE pairs that is used by several of the programs
listed below is EXBLX, as used by MSPcrunch
(Sonnhammer and Durbin (1994), "An expert system for processing
sequence homology data", Proceedings of ISMB 94, 363-368).
The most common operations that one tends to want to perform on
sets of NSEs and NSE-pairs include intersection, exclusion, union,
filtration, sorting, transformation (to a new co-ordinate system)
and dereferencing (access to the described sequence). With a
suitably flexible definition of NSE "similarity", these operations
form a basis for more sophisticated algorithms like clustering and
joining-together by dynamic programming. Programs to perform all of
these tasks are described below, with links to local copies.
Criticism of and new links for this page are always welcome. Please
contact the page administrator, whose email address appears at the
foot of the page.
Broad-functionality Perl 5.0 modules developed by Tim Hubbard and
extended/maintained by Richard
Bruskiewich. Given that the modules lie in your perl module @INC
path, "use GFF" imports all the associated modules for use. These
modules include:
GFF: base class to (Homol)GeneFeature and GeneFeatureSet's
29/4/99 Advisory: Module (package) spaces reorganized and modules renamed:
GFFObject.pm => GFF.pm - is the only module users need to 'use' in their scripts (pulls in the other modules...)
GFF.pm => GFF::GeneFeatureSet.pm
GeneFeature.pm => GFF::GeneFeature.pm
HomolGeneFeature.pm => GFF:HomolGeneFeature.pm
19/4/99 Advisory: GeneFeaturePair.pm and GFFPair.pm (formerly a part of the
broad functionality Perl 5.0 modules) have been completely deprecated, with corresponding functionality now
merged into GFF.pm (the score() method) and GeneFeature.pm (all '*Match*() methods).