Columns Config Window

ZMap Load Columns dialog

We need to update this significantly to handle BAM data as requested by users and also we need to improve the existing interface. Specifically:

Load and display columns: reformatting the dialog

It would seem natural to be able to request and show/hide columns from the same GTK pane although this will give a more cluttered display.

Here's a sample layout for the normal columns. (NOTE that this is just an example, we expect to have RNAseq tabs as well and many more columns; there will also be several buttons at the bottom to operate on all columns and to apply the requests). The combo boxes show status for each column and can be used to enter requests. a picture of the dialog box

Buttons will are provided to operate on all columns in the group. Note that 'request failed' will have to know about the previous request ie if it was from the mark or not. Options can be selected for several groups and actioned with the Apply button. Revert will set the current group back to its current status (cancel any requests in the group) and Revert All will reset all groups.

Special Columns

3FT is not a loadable column and will not appear but instead will be controlled by the 3-Frame menu button.

DNA will be handled as a normal column; it is possible request this (and is normally done on startup).

Show Translation in ZMap is controlled via the column's right click menu.

It is probably possible to configure these columns into a suitable column group and show them with the loaded status widget disabled, probably triggering this by not finding a server config for the featureset in the column.

Some notes about 'columns'

Historically ZMap has provided a 'Columns' dialog which has been used to request data and also control the display. Requesting a column related to ACEDB which assigns one or more featuresets to a column - the ZMap user cannot request a simgle featuetset in a column that contains more than one. Otterlace by contrast provides a 'load columns' dialog that deals with 'filters' which can contain one or more featuresets and these do not correspond directly to display columns. Some examples illustrate the combinations in use:

Featuresets from one filter cannot be assigned to different columns.

An Otterlace filter can be described as corresponding to an analysis of some kind.

To replace the existing ZMap and Otterlace load column dialogs with one new one we need to provide requests using otterlace filters and control display by column. The otterlace configuration lists filters which somehow result in the required featuresets being requested. Each filter defined defaults to providing a single featureset of the same name which should be displayed in a column of the same name, but this can be set differently if necessary. There may be a requirement to hide/show individual featuresets within a column eg solexa introns can be split into 1dpf, 2dpf etc.

As a solution we will adopt the following policy:

Running grep over otter_config reveals that the only columns implicated are predicted transcripts, Predicted Regulatory Features, repeats and the solexa 1dpf etc columns. However ENCODE data comes in repetitions which are currently requested together and we may wish to control this more closely.

Request these two pages in firefox to see the current otterlace config.

https://enigma.sanger.ac.uk/sso/login
http://dev.sanger.ac.uk/cgi-bin/otter/60/get_otter_config?client=otterlace

Implementation

The dialog should respond to ZMap state

Displayed and loaded column status should change to reflect ZMap's state:

For load from mark it may be acceptable have the option always enabled and only action the request if the mark is set. It will also be easier to always provide options to request data and only action them if data is not yet loaded.

ZMap should remember previous selections on restarting a session

Currently Otterlace maintains session state and can continue to start ZMap with the appropriate columns to load. ZMap should default to load the same columns on a session restart and this can be saves in a separate SQLite database (to ensure that static databases remain static).

Default selection and column groups

Column groups (eg EST) conatin a number of columns some of which are considered 'core' depending on the sequence viewed. It is necessary to privode a 'Default' button to select all the 'core' columns.

Operation WRT multiple windows and views

Existing code supports a separate Columns dialog for each window (NOTE: a window is a single pane in a ZMap X window. Normally a view corresponds to the whole X window but it is possible to have multiple views in a single X window). The dialog is created by clicking on the columns button in the toolbar, and is created for the current focussed window. This is necessary to allow column display to be turned on/off for the selected pane. Load columns operates on the feature context and when data completes loading it is displayed in all the windows in the relevant view (each view has its own feature context).

To avoid creating extra work this will not be changed for the first implementation, as we expect users to only use one dialog at a time, and extra ones can be creeted/ destroyed quite easily.

Some old notes

I think this was partly abandoned due to the intricacies of GTK. SeqBitmap was removed but display options are as before.

Background

Deferred loading of columns

Historically ACEDB could define some columns for deferred loading (via a style) and these would not be requested in the initial data load. The columns dialog could be used to request these, and also to specify an address range via the mark. This implies the possibility of patchwork coverage of any column but it is not clear how much information would have been stored regarding the ranges requested.

More recently Otterlace has provided a post startup 'load columns' dialog and this is used to request data over the whole address range for selected columns.

Deferred styles have been removed and it is intended to allow ZMap to request data from (pipe) servers as configured via the columns dialog - this will allow similar functionality as within the Otterlace context while running in standalone mode. It has been suggested that the load and display panes can be combined although this would require a re-arrangement as the display pane works by strand. It is also worth noting that GTK positions widgets automatically and it is not always easy to modify existing layouts with sparse buttons.

Note also that display is traditionally done by column yet data is requested nominally by featureset

To provide RNAseq data it is also necessary to implement a more complicated request interface as defined here.

User Interface

The display options for each column are show hide and auto - note that auto is not a default setting but instead 'show according to zoom level' and three radio buttons are needed for these options for each strand.

For loading, each column will have a tick box to load data which will be selected and disabled if data for the full sequence is present. If partial data has been loaded (eg from the mark previously) then this tick box will be presented as de-selected unless selected by the user.

Requesting from the mark will always be possible if full sequence data is not present and ZMap will maintain a list of regions loaded and optimise the user's requests. Whatever regions that have been loaded will not be presented - the users can look at the ZMap display to find out and if necessary columns can have background colours to show 'not loaded' status. There will be an option to load all selected colums fully or to load all selected columns from the mark; combinations of this will not be possible in one operation. This will provide a simpler interface, the assumption being that user will load columns en masse on startup and from the mark when they have some specific task in mind.

Buttons will be provided to:

Implementation

Display and load buttons will be generated by separate functions, to allow these to be combined onto one pane or split into two easily. Initially the load functions will be implemented on a separate pane, to provide extra function with minumum effort.

On requesting a column to be loaded ZMap will initiate requests for all the featuresets that map to that column.

It would of course be possible to supply controls for each featureset but this will not be implemented initially as a) it would require much more coding of user interface with very little benefit to the user and b) it would make the user interface very complicated - there are hundreds of featuresets defined and it is unlikely that users will want this level of control. RNAseq data is perhaps different and a custom control interface for that has been suggested.

A sample layout

Here's the existing layout:
Here's a sample layout in text, to be replaced with a real picture when implemented.

Trembl             Load [x]
Swissprot          Load [x]
Das_Phast_117      Load [x]

      [ Select All ] [Load Selected ]
      [ Select None] [Load from Mark]

Loading data

How to tell if a column is loaded fully or in the mark

There appears to be a data structure allocated per block and per column which has some kind of bitmap (zmapSeqBitmap.c)that relates to whether or not features are loaded in various regions - this was used for deferred styles in the past and interacts with the mark. The good news is that this is only used by zmapWindowContainerBlock.c and that the wrapper functions that use it are only used in zmapWindowDrawFeatures.c zmapWindowColConfig.c, and zmapWindow.c in code handling deferred columns. This would be quite simple to replace or change.

Deferred styles code has been #iffed out and it may be appropriate to remove or modify this bitmap code as well but it is necessary to check where it is used.

The deferred style code assumes that columns are always present even if there is no data and this is no longer valid as columns are only created if features are to be displayed in them.

Required function

Load Featuresets or columns?

Data is requested by featureset - necessarily pipe servers are not connected with display columns and the featureset to column mapping is provided by ZMap configuration or historically by ACEDB. There is currently no mechanism to show or hide a featureset distinct from a column and display options operate on columns and forward and reverse strands operate independantly. This shows that the users will expect to deal with columns and not featuresets. RNAseq data may be an exception but as mentioned above a separate interface will be provided for that. Therefore we will present the user the option to load columns and not featuresets.

This presents a problem in that if the featureset to column mapping and the server configuration results in a column that has some but not all of its featuresets loaded then we have to allow the remainder to be loaded, and possibly allow selection of which ones. This is potentially tricky as some columns may have large numbers of obscure featuresets attached. Servers can also fail and this issue cannot be configured away.

Columns will be flagged as loadable as long as there is data missing and each time we request all the data that is not yet present, limiting to the mark if relevant.

Server or featureset status

Servers are configured as delayed or not and delayed ones (in fact all servers) need a loaded flag. It may be easiest to re-implement the block/ column bitmap indicating loaded status in the server. Alternatively this information can be stored in the featuresets data in the features context. Note that although most pipe servers return a single featureset some (eg Repeats) return several and ACEDB certainly does. However, these featuresets will always be requested in tandem as we no longer have the deferred styles as provided by ACE.

Given that server requests can fail this implies that a column may be considered loaded partially, and also over some sub-sequence region if data is requested from the mark.

The columns config dialog must be able to query this.

Current code updates the zmapSeqBitmap when features are displayed, which is not optimal - the user could conceivably set the column to 'Hide' and then request that data be loaded (possibly this may be prevented by the dialog code but if so it's not future-proof). (This may be a follow-on from loaded status formerly being held in the styles).

Note also that loaded status being queried from the canvas is wrong: data is loaded if it is present in the feature context.

Handling empty featuresets and server failures

In order to maintain 'loaded' status a successful data request must return a featureset even if no features are present, which implies that the GFF2parser module must pre-create featuresets, which will contain no features but have a loaded sequence region attached.

A server error will result in no data and therefore not set loaded status. If data is returned then the sequence loaded region should be adjusted to match. NB: code has not been checked for this.

Handling multiple blocks

Current code assumes that only one align and block exists and while it would be good to handle the possibility this would be futile as the rest of the code does not. Therefore: the feature context will be used to hold the loaded status, and in common with existing code when searching for a featureset the first align and block found containing it will be used. To implement multiple blocks these functions must be modified to take a container block (the current one in the window) and find the requested featureset.

If the columns dialog is displayed via a RC menu then this is simple; if not then the current focus column should be used (if available). If not then the user should be prompted to select one.

Handling multiple blocks may prove more complex and options for 'load columns in all blocks/aligns' may be necessary.

Handling column load requests

As a single server can provide multiple featuresets which can be configured into different columns (unlikely in practice, but software has to be rugged) it will be better to store loaded status per featureset. When the user requests that a column be loaded then if a contained featureset is not present or not covering the mark then the relevant server will be found and a request queued. Each featureset in the returned context will be processed by the mergeContext function and a list of loaded regions maintained. For simplicity, a fully loaded column will have a single sequence region in this list covering the whole sequence.

Possible future modifications

The above defines a user interface based on previous versions of ZMap and we need to consider how potential future requirements may be affect by the data structures and implementation chosen.

Requesting individual featuresets rather than columns

Loaded status and column loading will operate as is. The change needed would be a bigger user interface. However if needed this would be relatively easy to add: RC on column gives a choice of 'configure this column' or 'configure all columns'. If the user chooses the first option then we can list featuresets for the single column and allow these to be selected.

Requesting data from multiple blocks

Some work was done on this in the past with the intention of being able to explore big genes and re-arranged regions. Current code works on the basis of requesting within the current block only and this seems to be reasonable. Featureset loaded status should be TRUE if the current block is covered, which implies a slightly more specific implementation than would be ok for a single block, single alignment use. NOTECurrent code actually just gets the first block not the current one, as currently there can only be one. Changing this will affect at least zmapWindowColConfig.c/configure_get_point_block_container()

Bolting on an RNAseq front end

We hope that impmenting a request module will result in code that can be used by new dialogs without modification. RNAseq data is not yet fully defined in Zmap terms, but we assume that we must be able to request individual featuresets for specific regions, which is covered by the above. There may be a requirement to hide or show an individual featureset within a column, which is not relevant here.

Tasks Summarised

To implement the above:

Loose ends