zmapView - a mixture of V and C from MVC

Summary

zmapView is quite a large module! When adding a topic to this file please include a link here.

Requesting data from sources Step lists Retreiving data from a server Rationalising load-features code
Asking Blixem to Request BAM data
Views and Styles
General notes (need some def on how windows views etc hang together, now and ideally)
Requesting data from database sources

Historically, when ZMap creates a view it goes through a process of 'data loading' during which it requests all the data implied in its source stanzas. This has typically been ACEDB and this connection is maintained open for the life of the view to allow further requests by the user. Till this initial phase has completed the user may not interact with the view. With the advent of pipeServaers this model is no longer enough to cope with incremental/ optimised loading of data and we need to be able to specify sources as not active on startup, and to be able to start previously unknown sources on request from Otterlace.

Configuring data sources

A source stanza in ZMap's configuration file ZMap specifies the URL of the data source and some other options including featuresets supported. A new option delayed=true/false will set whether or not a data source will be activated automatically on startup. A pipeServer source will default to delayed=true and other types false.
NOTE: this proves to be quite fiddly. In the short term (to allow testing of the real code) we will require delayed=true to be set where needed. It may be easier to specify active and delayed sources in [ZMap].

Any request for data after the initial 'data loading' phase will be activated immediately regardless of the setting of 'delayed'.

Feature columns may be populated by data from several featuresets and this mapping was originally specified by ACEDB. A new stanza [featuresets] is provided to allow this to be defined statically in the main ZMap configuration file (NB: fix this link, it should point to the config user guide)

Requesting data at startup

An initial request for data must be configured as ZMap will not run without data, and this will prevent the situation where we have a blank window and no user control. It also provides an opportunity to add in predfined data - this can probably be tidied up with little mishap, such that predefined styles etc are added automatically and no initial data requests are required. However, it may be necessary to include all the navigator featuresets in the initial load.

On startup the config file will be scanned and the list of data servers extracted. Those that are not configued as 'delayed' will be connected to and all featuresets they support requested. As at present if they support many featuresets then all will be requested together; if it is desired that these servers supply each feature set concurrently then seperate source stanzas must be defined for each one. (This should not be important as we expect minimal amount of data to be requested in this way, and the primary mechanism will be via X-Remote commands.

Requesting data after startup

This can only occur after the initial 'data loading' phase. If triggered by a 'load column' request from the user this must be so as they do not have keyboard control till then. However, if commands are recevied on X-Remote during the initial load phase then this may simply take longer - extra featuresets will be subsumed into the initial set of requests.

Requests can be triggered via a variety of interfaces, for example the Columns dialog, a Right Click menu or X-Remote commands. The View has a list of connections to data sources (corresponding to the source stanzas) that are kept active. Currently the functions that process requests use the first connnection in the list, which has traditionally been ACEDB and the data request is typically the name of a featureset

The existing X-Remote protocol is suffcient for our purposes and will be used unchanged, this means that data will be requested by featureset name; a sample data request looks like this:

<zmap>
  <request action=\"load_features\">
    <align>
      <block>
        <featureset name=\"EST_Human\">
        </featureset>
        <featureset name=\"Saturated_EST_Human\">
        </featureset>
      </block>
    </align>
  </request>
</zmap>
NOTE Please refer to web/user_doc/xremote.orig/xremote_overview.shtml for all XML protocol stuff. Ed's working on this and then it needs to be moved to doc/user_doc

On receipt of a request ZMap will re-read its config file and scan the list of source stanzas for the requested featuresets - this will define the data sources to request the data from (each source stanza defines the supported featuresets). A new connection will be started for each request to allow multiple requests to be processed by servers without imposing delays implied by queuing. Existing connections (ie non-delayed) may be used if not currently active.

Some loose ends

Styles are defined in a (large) file and it may be beneficial for Otterlace to be able to specify whether or not to re-read this file, or for indiviudual sources to have thier own styles files, or to be able to request a styles refresh without re-requesting data.

The pipeServer interface has a requirement to supply styles despite these not being handled by GFF (due to legacy issues). It would be good to remove this requirement at some point - styles can be defined at startup and there is little sense in merging one set of styles with an identical one. However thease can be changed at runtime so it's not so clear cut. The GFF parser requires styles so to fix this would required a deeper mod than might first appear necessary.

Migrating the code to pipeServers

This will involve the following steps, (any legacy code left over by mistake can be removed)

Note that a separate request will be allocated for each featureset, even if several are included in the same external request and as supported by the same data server. This is to allow maximum concurrency in the hope that data will be loaded faster. For example EST_human and Saturated_EST_Human typically get requested together and form a logical category together. This will also apply to ACEDB requests if more than one request is active.

Step Lists

The request process

A View has a set of data structures that implement a StepList (see zmapView_P.h), which allows a sequence of actions to be created and operated in turn, and also to specify actions to take in case of error. Currently (10 Mar 2010) the View has one of these lists and this is used to control the list of data servers together. To handle concurrent requests from different servers this will be changed to be a list of Step Lists, each operating independantly.

The request process is modelled on the ACEDB interface, consisting of a number of steps defined as ZMapServerReqType in include/ZMap/zmapServerProtocol.h. Old code used the first connection in the view's list and assumes that this is already active and for requests to pipeSevers this is not correct - the connection must be started fresh each time and closed when finished.

Code that handles this is found in zmapView.c/zmapViewLoadFeatures(), zmapViewUtils.c/loadFeatures(), and zmapView.c/commandCB() (to request a DNA sequence). The latter uses View->sequenece_server to find the right connection.

The function zmapView.c/zmapViewConnect() is used for the initial 'data loadng' phase on startup and uses similar code.

View step_list and connection_list: a strategy

The view has a list of active connections, originally none of these ever died. There is also a step list that refers to lists of connections. The connection_list will be modified so that each connection has its own step list, and the step list will no longer have a list of connections. The step list structures will not refer to thier connection - the code is only called from a few places and the view/ connection is available at all of them.

The step list poll function (zmapView.c/checkStateConnections() will poll the entire connection list and inspect the step list for each. As the step list has a 'current' pointer this will be efficient, and servers that complete and are terminated can be removed from the connnection list.

View State and Busy Cursor

On the initial load the View will be set as 'data loading' and a busy cursor displayed until all loading has completed.

Loading data after startup set the View to 'columns loading' and a busy cursor displayed until all requested columns have completed. It may be possible to delay the busy cursor till the first requested column is ready for display, but intitially we will not implement this as it may allow for race conditions to occur due to user activity. The view state 'columns loading' will revert to 'data loaded' on completion.

Data Compression

Initially plain GFF format (version 2, replaced by version 3) will be used, but to reduce network bandwidth this may be GZipped - these should compress very well. It may be advantagous to GZip the data in smaller chunks, which will require less memory (important if we load 100 featuresets at once).

Progress reports

Currently Otterlace shows a progress bar for data loading and to retain this while usihng pipeServers X-Remote messages will be added to drive this. Initially this will consist of reporting 'completed', but if GZip chunks are implemented this can be broken down somewhat. The X-remote message format will be like:

hello: ref to x-remote doc mentioned above

Retreiving data from a server

Transferring data

The step list code described above operates a sequence of operations which differ depending on circumstances but nomimally involve opening a connection and requesting data of various types. It's not totally obvious how data used for requests arrives at the server and how data supplied by the server ends up back in the view code.

The step list structures maintain data sent accross several steps as required by various parts of the code and this is done by the functions dispatchContextRequests() and processDataRequests() which are defined in zmapView.c and passed to the step list code to be called before and after the requests are actioned. There are some other structures flapping around: a ZMapFeatureContext (which will ultimately contain all(?) the data supplied by the series of requests) and a ConnectionData which contains information needed by the requests and is private to zmapView. At various points in the sequence of requests data is passed from one structure to another. Data supplied by one step is needed by subsequent ones and must be passed back to where it came from so that it can be acessed.

Styles

Originally (and currently) these were supplied from ACEDB and requested by ZMap. File and Pipe servers cannot supply styles as the GFF file format does not allow this and the code that processes a GFF file will remove features that do not have a style associated and this means that the file and pipe server modules must be supplied with a styles file. They then decode this and construct styles data structures which are passed to the calling module (zmapView). These are held in the ConnectionData structure and passed back to the step list request structures by the dispatchContextRequests() function. They also get merged into the view, which is what is needed for them to actually be used.

Featuresets and DNA sequences

In a similar process to styles data the feature context supplied by the server is patched into the subsequent step list request structures. Following requests update this context and as requests complete it accquires more data.

In the ConnectionData structure there is a note of which step is the last one supplying data and in processDataRequests() this is used to trigger merging the data from the source into the view's data. This is mainly necessary due to the handling of sequence data (DNA featureset) - this data is optionally requested from ACEDB via a separate request from other features and the other servers follow the same model.

Memory issues

However, once in the feature context it appears that it can be copied/ merged as normal. The code that does this in zmapView.c/justMergeData() which calls zmapFeature.c/mergePreCB(), which raises some memory and scope issues regarding the sequence data. As a sequence may be quite large it is reasonable to expect it to be freed at some point - the server will allocate this, pass it on to the view which will store it. If we request a sequence twice then obviously we only want to keep one copy of the data. Ultimately the sequnce is attached to its block but is also referred to in the DNA feature item, and if we allocate or free another copy the this needs to be propogate to mulitple places.

Blixem and sequence servers

It is also possible to request short segments of DNA on the fly - this goes through a similar interface (Step lists and displatch and process functions) but this requires there to be a sequence server which runs from ZMap startup (ie not delayed) and has a DNA featureset, and only one of these is allowed. (April 2010) There is some code Ed wrote to specify sequence servers per sequence which is currently iffed out (eg zmapView.c/getSequenceServers()).

Rationalising load-features code

The problem

Historically there have been two functions that request data from servers zMapViewConnect() and zMapViewLoadFeatures() and these are both similar and subtly different. As the request logic has become more complicated this requires intricate modifications to be made twice. There are also other functions involved in requesting OTF data and DNA. The aim here is to provide functions used by both interfaces to encapsulate the request logic in one place.

Functional requirements

Notes

zMapViewConnect() does a lot of config file handling and initialisations and then requests all non-delayed data. This is done per server and it makes a lot of sense to call zmapViewLoadFeatures() for each one.

zMapViewLoadFeatures() is also called from the load columns dialog and the (external) XML interface.