GFF is a format for describing genes and other features associated with DNA, RNA and Protein sequences. The current specification can be found here. The current version level of GFF is Version 2.

This page is a starting-point for finding out about this format and its use in bioinformatics. In particular, since its proposal a considerable amount of software has been developed for use with GFF and this page is intended as a focus for the collation of this software, whether developed in the Sanger Institute or elsewhere.

A GFF record is an extension of a basic (name,start,end) tuple (or "NSE") that can be used to identify a substring of a biological sequence. (For example, the NSE (ChromosomeI,2000,3000) specifies the third kilobase of the sequence named "ChromosomeI".) GFF allows for moderately verbose annotation of single NSEs. It also provides limited support for NSE pairs in a rather asymmetrical way. An alternative format for representing NSE pairs that is used by several of the programs listed below is EXBLX, as used by MSPcrunch (Sonnhammer and Durbin (1994), "An expert system for processing sequence homology data", Proceedings of ISMB 94, 363-368).

The most common operations that one tends to want to perform on sets of NSEs and NSE-pairs include intersection, exclusion, union, filtration, sorting, transformation (to a new co-ordinate system) and dereferencing (access to the described sequence). With a suitably flexible definition of NSE "similarity", these operations form a basis for more sophisticated algorithms like clustering and joining-together by dynamic programming. Programs to perform all of these tasks are described below, with links to local copies.

Criticism of and new links for this page are always welcome. Please contact the page administrator, whose email address appears at the foot of the page.

Sanger Institute GFF Perl Modules

Broad-functionality Perl 5.0 modules developed by Tim Hubbard and extended/maintained by Richard Bruskiewich. Given that the modules lie in your perl module @INC path, "use GFF" imports all the associated modules for use. These modules include:

A GFF Perl Installable Archive of all these modules and their associated HTML documentation, is now available.

29/4/99 Advisory: Module (package) spaces reorganized and modules renamed:

19/4/99 Advisory: GeneFeaturePair.pm and GFFPair.pm (formerly a part of the broad functionality Perl 5.0 modules) have been completely deprecated, with corresponding functionality now merged into GFF.pm (the score() method) and GeneFeature.pm (all '*Match*() methods).

Josep Abril's GFF programs (IMIM, Spain)

[New] Web site for gff2ps and gff2aplot, programs to graphically representing GFF file data (highlighted at ISMB '99).

Ian Holmes GFF programs & scripts (pre-1998 repository; no longer updated at the Sanger)

Updated versions of some of these scripts, maintained by Ian Holmes can be found at http://biowiki.org/GffTools/

More information about this program is available on request.

Several of these scripts duplicate functionality provided by Tim Hubbard's perl modules (see above), but may be less algorithmically complex (a significant consideration for chromosome-sized GFF files!).

Please do email Ian Holmes if you require documentation for these programs.