zmapFeature - Feature Context - the model part of MVC

Index Identifying Mapping to columns and styles Source Code Styles -implementation Terminology Display
Restructuring the Feature Context: teasing apart featuresets and columns.
Containers and Styles Searching for features
Source Code

Source Files
zmapStyleUtils.cPrint functions for debugging
zmapStyle.cImplementation of the style objects
zmapFeatureUtils.cNo style specific data or functions
zmapFeatureTypes.cPre-defined styles and inheritance
more to come...
Headers
zmapStyle.hContains all the config strings, similar to zmapConfigStrings.c
zmapStyle_I.h Private style data structures
more to come...

Terminology

Features are organised in columns on the ZMap display, a group of these is known as a featureset.

However it gets a little more complicated than that: in include/Zmap/zmapServerProtocol.h we find that a feature source is the name of a feature as it is known in a far-away database. Several of these can be placed in a single featureset.

A feature source is not where a feature is retrieved from but the name of it. Various parts of ZMap also refer to a source in a zmapServer context in which case source refers to the name of the server as defined in a ZMap config file stanza or the zmapServer module itself. A server module can supply a mapping that specifies which featureset a feature source is to be placed in.

Many of these words can get used interchangably.

Example

EST_human is a column that contains the feature sources EST_human and Saturated_EST_Human, all three of which are know as featursets. column is unambiguous.
Mapping features (data sources) to featureset (display columns) and feature style

Source to feature set

Traditionally, ACEDB provided this mapping during ZMap startup, but without an ACEDB connection we have no way of recieving this data.

By default, any feature requested from a pipe or other server that does not support the REQ_FEATURESETS request will be mapped to a column of the same name - this will produce a wider than normal display, but the data will at least be visible.

A new config stanza [featuresets] will be provided in ZMap to override this default mapping and will contain lines of the form:

column = source1; source2; ... sourceN
where 'column' is the name of the column that is used to display all the sources and sourceX is the name of a featureset used to request the data as receieved from Otterlace and as sent the a pipeServer or other.

NB: This stanza will only be read in when creating a view and not on requesting a column. To change this mapping it is necessary to update the file ZMap (config) and then create a new ZMap window.

Mapping Features to Styles

Traditionally ACEDB could map a feature to a style, but for a pipeServer that cannot do this (as GFF does not support this) the obvious solution is to have a 1-1 mapping of featuresets to styles of the same name. As styles can be inherited, if it is required to have two featuresets displayed using the same style then they can both inherit the same base style unmodified.

Merging data from different sources

Previous ACEDB functionality will remain honoured. If we end up with conflicting data being defined by different sources then the first defined data will have priority regardless of the source. This is in line with existing merge operations. Note that this implies that feature attributes can only be added to or changed but not removed (eg by taking out a background colour).

Identifying a featureset A featureset has a original_id and a unique_id (see zmapFeature.h) - the original_id is as given in a config file or from a database and is regarded as case insensitive. This is stored as as a GQuark and is translated into an internal_id which is in lower case, and for some features also includes extra information as necessary.

There are functions (eg zMapFeatureSetCreateID() - simple lower casing) that do this. (Need some more notes here).

A zmapView has some data structures that map feature set sources to display columns and styles:

  GHashTable *featureset_2_stylelist ;    /* Mapping of each feature_set to all the styles it requires. */
  GHashTable *source_2_featureset ;       /* Mapping of a feature source to a featureset. */
  GHashTable *source_2_sourcedata ;       /* Mapping of a feature source to its data. */
These are hash tables that map a featureset unique_id to another data structure.

According to zmapViewRemoteReceive.c/xml_featureset_start_cb() zmapView->source_2_featureset is a hash table of ZMapGFFSet and zmapView->source_2_sourcedata is a hash table of ZMapGFFSource. We presume that the id fields relate to unique_id rather than original_id (styles have both, as for featuresets).

NOTE:In the ACEDB interface, ACE provides the source to featureset data. The View should merge these data together if there is more than one such server bit at present only the first one is used (see zmapView.c/processDataRequests(), search for 'needs sorting'). This data is used by the X-Remote interface when requesting certain data sources (eg on-the-fly (OTF) alignments). The ZMap config file lists featuresets supported by ACE and the source to featureset data from ACE specifies how further un-configured data sources map onto the known featuresets. This is particularly relevant when searching for which server to request data from.

/* Struct for "feature set" information. Used to look up "meta" information for each feature set. */
typedef struct
{
  GQuark feature_set_id ;                           /* The set name. */
  char *description ;                               /* Description. */
} ZMapGFFSetStruct, *ZMapGFFSet ;


/* Struct holding "per source" information for GFF data. Can be used to look up the
 * style for a GFF feature plus other stuff. */
typedef struct
{
  GQuark source_id ;                                /* The source name. */
  GQuark source_text ;                              /* Description. */
  GQuark style_id ;                                 /* The style for processing the source. */
} ZMapGFFSourceStruct, *ZMapGFFSource ;

zmapView->featureset_2_stylelist is more obscure: zmapView.c/addPrefined() sets zmapView->orig_styles as a GData (keyed data list), and then zmapView->featureset_2_stylelist as a GHashTable of GLists of style quark id's. (see zmapView.c/styleCB() and zmapGlibUtils.c/zMap_g_hashlist_insert());

This last thing looks confusing: it looks like the style is is keyed off the style id rather than the featureset id; (it may be that the style names are identical to the featureset names in this case??). Identical code appears in zmapView.c/processDataRequests() to add in a 1-1 mapping from a (pipe) server that does not support this mapping. zmapServer/acedb/acedbServer.c/parseMethodStyleNames() set up a similar list with different featureset and style names.

Some related information from zmapGFF2Parser.c: "NOTE the feature sets style has the same name as the feature set" - this relates to featureset expressed as a display column.

NOTE:In zmapFeatureContext.c/zMapFeatureString2QuarkList() the function zMapStyleCreateID() is used to get a feature ID, or so it seems... This function is used to make a list of featureset names in zmapViewConnect(). It turns out that zMapFeatureCreateId() simlpy calls zmapStyleCreateID()

Styles: use

Inheritance rules

These appear to be:

Merging styles

If a new version of an existing style is merged into the existing then the new takes priority.

Default values

These are defined by GLib for attributes (parameters) in the case of integer values but not others, so using GLib directly to define defaults is not a good option.
Colours default to the fill colour if not specified.
There is a 'default bump mode' which is an attribute in its own right.

zmapStyle does not implement explicit default values for style attributes.

Modes

Each style has a mode which defines which one out of a number of option sets may be used. These option sets are mutually exclusive (for example for an alignment feature it is not valid to define a 'graph-baseline'). Attempts to set a mode specific paramter in a style with a different mode will fail and a warning added to zmap.log. If a style's mode is not set then setting a mode specific parameter will set the mode implicitly - this is to allow definition of inherited styles in any order.

Styles: implementation

Original method

This was done using GObjects and configuration files read in using GKeyFile functions. A large number of parameters have been defined and access functions implemented, and this has become unwieldy. Some complicated copy functions have been written to cope with data not explicitly defined by GObject parameters, most notably the 'is a parameter set' flags and the mode dependant parameters

March 2010 Implementation

We still use GObjects but a number of aspects of the original implementation are changed:

Interaction with config file code

It's possible to get quite confused by quarks. The existing config file code defines some parameters as strings and at some point turns these into quarks via a callback from a callback. As the styles code handles string to quark conversion with much less complexity there is no need for this and it would be simpler to present this data as strings. However as the previous interface to the styles code had certain string values defined as integers it may be best to preserve this: there are the file config and acedb interfaces to consider, and both must be kept in step.

In particular:

Accessing mode dependant data

In a frenzy of 'executive decision' making i (mh17) decreed that no mode specific data may be set without setting the mode first. This implies that a style definition (eg in a config file) must either specify the mode first or our code must fish out that option before others. NB: the function zMapStyleIsDrawable() insists that the mode must be set.

Messages will be added to the log if these errors occur.

Copying Styles: some technicalities

The existance of the 'is set' flags for each attribute and the mode dependant data (a union) causes some implementation difficulties, a situation that was previously solved by sub-classing zmapBase and introducing the 'deep copy' functions. These allowed data not visible to the GObject interface to be handled but at the expense of breaking the clean implementation/ design of the objects.

The March 2010 implementation solves these problems by making all data accessable to the GObject parameter interface and by forcing the flags and mode data to be copied first. This is done by installing these properties first and double checking at run-time that this data is in fact available when dependant attributes are accessed.

A g_object_get() will fail for an attribute that is not set. The function zMapFeatureStyleCopy() only copies parameters that have been set. A copy constructor has not been fully implemented, but if needed (right now we do not copy styles via the GValue interface) can be provided by uncommenting the function zmap_feature_type_style_copy_set_property(), asigning it in zmap_feature_type_style_class_init() and p[ossibly adjust the class's GValueTable as in zmapBase.c (which has been #iff'ed out but otherwise retained).

In a similiar way, mode dependant data will not be set if the selected mode is inappropriate.

Another strategy would be to not use a union for the mode data, which would use more memory but not a dangerous amount.

Adding new Style parameters and types

For a new parameter:

For a new type of parameter:

Some questions

The style has a number of properties, some of which could arguably be better placed elsewhere, as they appear to relate to featuresets rather than a display style??:

Optimising performance on feature display

The problem

Historically much use was made of GData keyed data lists but these proved to be very inefficient, and while most of this code has been changed there is still some g_datalist() code left, especially related to styles. Each feature that is displayed requires two searches in a global style list (eg of 300 styles) and this is obviously going to be quite slow.

There are other aspects/ instances of similar code, but as this particular instance is likelt o hit overall performance most it will be addressed first.

A solution

Currently each feature is assigned a style ID by zmapGFF2parser.c and on display this is used to look up the style when drawing the feature in the display context. The featureset/ column is also assinged a shorter styles list to optimise access but unfortuantely at this point it is not used.

zmapGFF2parser will be modified to duplicate a featuresets's style on creation and each feature will be assinged a pointer to that style in parallel to the style_id as presently used. This style will exist (invisibly) as part of the features context and should be freed on featureset destroy.

At some later time when the styles have been fully integrated in thier new form (without g_datalist) then the style_id will be removed.

Initially only the functions ProcessFeature() and zMapWindowFeatureStrand() will be modified to use this new pointer.

Some other notes

zmapGFF2parser.c/makeNewFeature() also does two style lookups for each feature when it is only necessary to look up one per featureset - it is possible to remove about 200k calls to zMapFindStyle() each of which will search a list of maybe 300 styles. This should have a significant effect on 'Data Loading' performance.