Sequence features are DNA or Peptide sequences that are displayed as bases or residues in text format. They cover the whole loaded sequence and at low zoom only a small number of bases can be displayed accross the column. Sequence features (indirectly) limit the zoom level in the window as we wish to avoid displaying one base in more space than is required for the text font (but note that peptide displays may have gaps between residues).
It is possible to select a sequence feature by region or feature - this implies that the feature is partially highlit, which is a departure from the norm for other types of feature.
Existing code has:
zmapWindowTextFeature,c small, simple text feature eg locus name zmapWindowTextItem,c used by sequence feature zmapWindowSequenceFeature.c column wide sequence data that gets redispayed on zoom.Note that here we are not concerned with zmapWindowTextFeature, although implementation may reveal that it may be advantageous to combine the two. Other than the DNA and 3-frame columns, we also have 'Show Translation in ZMap', which displays a short peptide sequence on demand.
In the ZMapWindowCanvasItem universe, a sequence feature is a ZmapWindowCanvasItem (a Foo CanvasGroup) and a text feature is a derived FooCanvasItem. Each line of text in the column is a FooCanvasItem, and whenever the zoom level changes the foo canvas items have to be refreshed (replaced?). Whether or not the whole sequence in the scroll window is displayed seems likely but not yet clear.
Highlighting is beleived to be done by displaying a polygon behind the relevant text. There may be several of these and colours may vary. Note that the polygon equates to a series of rectangles, one on each line of the display.
There is a fairly elaborate system of callbacks that is used to add highlights and alter zoom - this is assumed to be required by the need to delete and add FooCanvasItems to perform these functions.
A single CanvasSequence feature will be added to a column and this will handle the entire sequence for the region in question (although note that the DNA sequence is actually stored upstrean in the block feature). In a similar way to graph items and heatmaps, by convention only one such feature will appear in the column, although there is the option to include (eg) 3 frames of peptide sequence in a single column with each one offset to display them side by side, as for heatmaps.
Displaying text will be done OTF - if the zoom changes then the SequenceFeature will recalculate the column width and the coordinates of displayable text (as a start coordinate and display parameters), and how many characters to display. The actual displayed text will be assembled into a temporary buffer at drawing time by calculating an offset from the start coordinate related to display coordinates (zoom and scroll). Peptide residues will be calculated OTF. Note that the zoom code called by the window will bias any user zoom request such that each line of the sequence feature has a whole number of bases, and we can assume this to be the case.
A simple approach to text display will be adopted and hopefully this can be done via the GDK interface to pango in a few short function calls.
This must be driven by calling code. Each region of the sequence to be highlit will be specified by a function call or data structure (eg GList) containing coordinates, length? and colour.
This data will be stored by the CanvasSequence feature and if present each expose will result in the background being drawn first.
As only one feature or region may be highlit at a time (corresponding to the hot focus item), even if several are selected, then removing a highlight will be a simple process of deleting the list.
The function drawFullColumnTextFeature() will be subverted to call the canvas featureset code instead of the previous implementation unless the style option:
foo=truehas been set. This will handlle all the basic display function automatically.
Highlights will be via code in the existing zmapWindowSeqDisp.c, which ultimately calls three public functions in zmapWindowSequenceFeature.c:
zMapWindowSequenceFeatureSelectByFeature() zMapWindowSequenceFeatureSelectByRegion() zMapWindowSequenceDeSelect()These functions are outside of the CanvasFeatureset (and ZMapWindowCanvasItem) interface but can easily be added to the existing core function:
void zmapWindowFeaturesetItemSetColour(FooCanvasItem *interval, ZMapFeature feature, ZMapFeatureSubPartSpan sub_feature, ZMapStyleColourType colour_type, int colour_flags, GdkColor *default_fill, GdkColor *default_border)which can be amended for the demands of sequence features. Note that there is no explicit 'unset colour' but this can be implemented by passing NULL as the sub-feature argument or ZMAPSTYLE_COLOURTYPE_INVALID as the colour-type.
The 3-Frame display has different background for each frame and this will be set by the set colour function as above if passed the argument ZMAPSTYLE_COLOURTYPE_NORMAL. Highlights must be given with ZMAPSTYLE_COLOURTYPE_SELECTED.
This will be implemented by setting a flag to only display the highlit regions, and to show filler characters as required.
The major unknown is the effort required to operate pango, however there is experience in the team that can be drawn on. Some quite fiddly calculations with coordinates are required. In very rough terms the basic code should be achievable within a week, and then we need to allow careful testing.
Specifically:
Display of DNA has been implemented to the point of demonstrating that pango can be used to paint single lines of text on demand and also that background highlights work. This includes the first three items above, but note that the code is still open and will crash on feature select.
The original ZMapWindowCanvasItem code subverts the structure of the canvas design by providing interfaces for highlight outside the usual places, and there are also some special functions added to capture the mouse. Both of these issues can be resolved a) by using the SetColour function as designed and b) by adding code to the existing mouse handling functions.
At high zoom we get a consistent sequence of bases that agrees with the old software, but note that there is some rounding diffference between old and new on 2nd and subsequent rows at low zoom.