Sanger Institute, Wellcome Trust Genome Campus, Cambs, UK All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.
How to Read Method Protocols
Normal Perl data type notations are used for argument declarations in the method protocols. A backslash denotes argument passing by reference. Class methods are invoked using the 'class->method(args)' or 'method class args' Perl call formats.
$line
string into a
GFF::GeneFeature object (creates and returns the object reference).
$line
string into the
GFF::GeneFeature object using a user-defined &parser
function
(creates and returns the object reference). The &parser
should
expect the (empty) GFF::GeneFeature object reference as its first argument
and the input line (string) as its second argument. So given, the function
should perform the appropriate parsing of the input $line
to
load the GFF::GeneFeature object with data. A typical use for this method
is to parse non-GFF native formats into GFF format.
$group_field
string into the invoking
GFF::GeneFeature object [group] field. The method assumes that the invoking
object already knows what GFF version it is.
dump_string()
to write a formatted output of a
GFF::GeneFeature object to a filehandle, OUTPUT. If \*OUTPUT is not given,
\*STDOUT is used. When the optional $tab, $flen
and
$newlines
arguments are omitted, this method is guaranteed to
dump well-formed GFF records meeting GFF version standards. The use of the
optional parameters provide alternate, non-GFF compliant tabular text
formats for output.
dump_string()
method, restricts [group] field dumping
of the GFF to the specified tag. If $tag is
undefined, all [group] tag values are dumped. If $tag is defined but null (empty string), then no
[group] tag-values are dumped. Otherwise, a defined and non-empty $tag value is assumed to be a simple identifier or
Perl regex matching tags whose values are to be included in the dump.
The optional ``$tab'' argument is a boolean flag, where a ``true'' (non-null) value directs the use of tab as the field delimiter in the output line; otherwise, use blank space (flag is assumed ``true'' (non-null) if not specified).
The ``$newline'' and $tag arguments are passed to
dump_group (see below). See dump()
method above and the
dump_group()
method below for explanation about $tag argument.
The optional ``$tab'' argument is a boolean flag, where a ``true'' (non-null) value directs the use of tab as the field delimiter in the output line; otherwise, use blank space (flag is assumed ``true'' (non-null) if not specified).
The ``$newline'' argument is a boolean flag (assumed ``false'' if omitted), where a ``true'' (non-null) value directs both that [group] field tag-value pairs are printed one per line (with the semi-colon omitted), and that any '\n' characters in normally double quoted free text [group] value strings are converted to ``real'' newlines which wrap the text into multiline printed format. Both newline effects are printed within the restricted context of the [group] field column. In other words, the $newline flag is used for some semblance of ``pretty printing'' group fields.
If $tab and/or $newline are specified, then the
$len
argument should contain the length of the dump line
preceeding the [group] field.
If the $tag argument is undefined, then all tag-value pairs are dumped. Otherwise, $tag is assumed to be a simple tag or Perl regex expression matching tag value fields which are to be dumped.
Note: A subtle feature (or bug, depending upon your point of view) of this routine is that tags-values are dumped in ``tag'' ascending alphanumeric order, not necessarily in the original order read into the system (i.e. from an original GFF file read in by GFF->read()).
dump_string()
to write a formatted output of a
GFF::GeneFeature object to a filehandle. Includes information about any
(overlap) matching GFF::GeneFeatures (note: match output for each record is
multi-line, the matches designated by an indented '=>' bullet). If
\*OUTPUT is not given, \*STDOUT is used. The ``$tab'' argument is a boolean
flag, where a ``true'' (non-null) value directs the use tab as the field
delimiter in the output line; otherwise, blank space is used as the
delimiter (assumed ``true'' (non-null) if not specified).
$gf-
group()->{$tag}
== \@values)>.
If the argument list is empty, then the reference to this hash is returned.
If only a '$tag' name is given as an argument, the value list for that tag
is returned (valueless tags are created in the object, but return undef as
the valuelist - use the TAG()
AUTOLOAD feature to test for
such tags (see below)).
If values are provided in the call, the tag is set to these values
(overwriting any previous values - see also
group_value_list()).
\@values
) resets the $tag hash value to the new array
reference (and returns it). If $tag is undefined
or null, then the function just returns the value-list of the first tag (if
any) in the tag-value pairs. If the $append
argument is
defined and non-NULL, then the given @values
are appended to
any existing value list (default: 0)
Version 1 GFF: just returns the [group] string value embedded in a single member list; $tag is ignored.
$index
is not given, the first value of the value list is
returned. If $value
is specified, the ith element is set to it
(note: if the $tag associated value array is
undefined when this method is called, then it is created and its value is
set to a single element list containing $value). (Version 1 GFF) just
returns the [group] string, ignoring any $tag or
$index
provided. If $value
is provided, then the
[group] name is reset (same as $gf->group($value)).
Version 1 GFF - $tag argument is ignored; the method undefines the [group] field value.
Version 2 GFF - if no $tag argument is provided, the entire [group] tag-value array is cleared of tags and values. If a $tag argument is provided, only the indicated tag (if it exists) is deleted from the [group] field, along with any associated values.
$offset
amount to the start and end
coordinates of a GFF::GeneFeature. Note: the start and end of the original
object, not a copy, are changed.
Based upon their specified start and end coordinates, two GFF::GeneFeatures
will either overlap perfectly, partially or not at all (are ``misses'').
The $tolerance
value specified controls the match decision for
each category of overlap as follows:
$tolerance
value of 0 dictates that an exact
match is required, that is, that the corresponding 5' and 3' coordinate
ends of both GFF::GeneFeature objects must be equal to one another.
Release 2.106 revision: optionally, the $tolerance
argument can now be a reference
to an array 5' and 3' end specific tolerance pairs [t55, t53, t35, t33],
where t55 == 5' of the 5' end of the gene feature, t53 == 3' of the 5' end
of the gene feature, etc. (Note: 5' is a function of strandedness, if any,
or simply 'start' for '.' strand objects),
$tolerance
value less than -1 is specified, then a
match is declared if the misses are within $tolerance, that is, if the
difference in coordinates of the closest segment ends of the two
GFF::GeneFeature objects is less than the $tolerance.
$tolerance
conditions, for positive
$tolerance
values and imperfect GFF::GeneFeature overlaps, are
relaxed such that either a 5' or a 3' end mismatch within tolerance results
in a positive match.
If ``$strand'' is given (i.e. not 0), then the match fails unless the two GFF::GeneFeature lie on the same strand. If $strand is zero, then strandedness of features is completely ignored in the match comparison (i.e. can also be '.' == unknown)
The 5 scalar fields returned in the match @result
array have
the following values:
$verbose
flag is defined and not
null, then a detailed match description is returned.
$verbose
flag is defined and not null, then
a detailed match description is returned.
$tolerance
positions (default 0).
Optional '$strand' argument forces strand sensitivity in merging.
The optional $group_tag
argument is overloaded to two forms:
1. a simple [group] field tag may be given, under which the method records merge info. This includes: [group] 'Sequence' tag value (if available), <source>, <feature>, <start>, <end> values.
2. a reference to a Perl function may be given, which expects to be invoked with the two overlapping gene features in the set. This function would generally modify the first gene feature object in some customized way.
The optional $addscores
argument stipulates that the scores of
merged objects are to be added.
If the $copy argument is set, then the method makes returns a copy constructed from the merged object, instead of the original invoking object (Note: separate copies are made for each merger event, so a 'self_overlap_merge' may be useful after this method is called).
Returns all merged (copy of) invoking GeneFeature iff an overlap_merge occurred, otherwise 0.
2.106(19/10/99) - rbsk: - match() method can take a reference to an array for the $tolerance value, consisting of pairs of 5' and 3' end specific tolerances. 2.105 (3/10/99) - rbsk: - $group_tag in overlap_merge() and associated code inherited from GFF::GeneFeatureSet::self_overlap_merge(). - group() method bug: fixed Version 1 GFF crash bug 2.104 30/9/99 - rbsk: - $tag argument in dump(), dump_string(), dump_group() 2.103 27/9/99 - rbsk: - $strand argument in overlap_merge() 2.102 21/9/99 - rbsk: - added $copy argument to overlap_merge() method; now also returns $self instead of simple '1' iff overlap occurs - created the deleteTag() method 16/9/99 - rbsk: match() method should totally ignore <strand> if $strand not set? 8/9/99 - rbsk: overlap_merge() $addscores argument. 8/9/99 - rbsk: fixed GeneFeatureSet.pm handling of valueless [group] tags: - group() method can now be used to set tags without values or tags with values, but $gf->group('tag') or $gf->group('tag','value0','value1',...,'valueN') ; Seems redundant to methods group_value() and group_value_list() which are similar but slight different in their operation. - For Version 2, AUTOLOAD now returns a boolean '1' for any [group] value tag which exists but has no values. Note that for Version 2, AUTOLOAD names not recognized ALL default to tag methods and fail silently by returning NULL (rather than throwing an exception, as in Version 1 GFF). 31/8/99 - rbsk: $append argument added to group_value_list() method; overlap_merge() method: optional '$tolerance' value provides for overlap merge where the two features lie within $tolerance base pairs of each other 27/8/99 - rbsk: sub parse_group() bug fix: couldn't parse some instances of 'end' double quotes... 18/8/99 - rbsk: coded explicit primary field access methods, rather than relying upon AUTOLOAD (i.e. to gain efficiency - George Hartzwell suggestion :-) 12/7/99 - rbsk: custom AUTOLOAD recognizes GFF Version 2 [group] tags as access functions i.e. $gf->Sequence() == $gf->group_value_list('Sequence') and $gf->Sequence(\@VALUES) == $gf->group_value_list('Sequence',\@VALUES) 4/5/99 - rbsk: renamed GeneFeature.pm => GFF::GeneFeature.pm 20/4/99 - rbsk: Instead of a list, the getMatches() method now returns a reference to a hash, keyed on GeneFeature references (getMatches_logical is now strictly boolean) 19/4/99 - rbsk: GeneFeaturePair.pm functionality merged with GeneFeature.pm (semantic change: Changed 'pairs' to 'matches' in object data structure in order to capture 'one-to-many' semantics: $fields{'Pairs'} => $fields{'matches'} addPair => addMatch # adds a matching GeneFeature getPair => getMatches # returns array of matches getPair_logical => getMatches_logical # returns number of matches (0=> false) dump_pairs => dump_matches # dumps all GF's with matches (and the matches too) intersect_overlap_pairs() => intersect_overlap_matches: bug found: use 'absolute' coordinate offsets of second relative to first GF 17/3/99 - rbsk: added $newline arg to dump(), dump_string() & dump_group() 16/3/99 - rbsk: GeneFeature objects now subclassed from GFFObject class; 13/3/99 - rbsk: bug fix in group_value() function, dump_group(), et al. 3/3/99 - rbsk: extensively revised and improved the documentation added Version 2 GFF code, especially, group() field management methods