MH17: I'm still working on this !
For a specific set of data (short reads via BAM files) we wish to display data density in a column and allow the user to inspect this data in Blixem. Features could be displayed as basic features in ZMap but there is little benefit in this. By specifying a style we specify that a featureset is displayed as summarised data (feature density over a small range). Code will be written to support density bins from a normal featureset containing the short reads features (which will be loaded on demand in a marked region) and the resulting data will also be available for other display formats such as heatmaps when implemented.
For BAM data we wish to show the feature density over a wide range and it make sense to match the bin size to display zoom - at low zooms taking a sample fixed size bin would misrepresent the data. For some other data (eg CAGE and DiTags) there is a short fixed bin size (eg 20-40 bases) that relates to experimental technique and cannot be changed.
This display style can also be used to show GC content and naturally would be fed with pixel wide bins.
All possible bin size display options must be limited to a minimum size as otherwise the display becomes meaningless.
A series fixed size bins would overlap and we would loose the overall density of features when hidden by a rare peak at low zoom. As zoom level changes slightly then a variable bin size may suddenly show peaks where there none before.
It is know that the foo canvas is not designed to handle large amounts of data and performs badly with hundreds of thousands of features and we wish to use wiggle plots as a test bed for a more efficient way of using the canvas. If we take as an example a sequence of 400k bases and assume bins of 20 bases wide then that gives ns 20k data points which traditionally would be drawn in the foo canvas individually at all levels of zoom, and most would be invisible at low zoom levels due to overlap. This gives a significant overhead operating the foo-canvas GObjects underneath all the features and may maximise the number of round trips via the X-server.
We wish to paint series of line segments via a GdkPolyLine function each representing a series of bins - it may be necessary to tune this by experiment - and this requries us to implement display objects at a higher level than the individual data point.
There should be no need to add more than one canvas item for a column but this can be controlled by a run-time parameter to allow experimentation. This canvas item (ZMapWindowCanvasWiggle) will have start and end coordinate and a list of bins; note that start and end cordinate are constrained to be on bin boundaries. In case of changing the zoom level it may be necessary to update these canvas items in which case there is a slight argument for having only one. Note that as we plan to not have a lot of canvas items is is feasable to remove and redisplay each of these should it be necessary. Note also that bins will have to be re-allocated in this circumstance.
A bin just needs start, end coordinate and volume
Canvas items will have a list of bins. A linked list is chosen to allow rapid implementation - a b-tree would be preferable but GLib does not have a good implementation of a btree structure (does not support sequential processing). Given that the data is static a simple (indexed) skip-list overlay can be added to this to allow efficient lookup of data without fear of degeneracy.
Partly this is intended as an experiment to see how the foocanvas can be used more efficiently but summary plots like these do not apply directly to drawing real features. If we take the example of ungapped alignments what we need is a btree/ skip list of features to draw on a canvas at a given zoom and instructions on how to draw them, and then we can just display the ones that are visible. Wiggle plots differ from this in that the data may change at different zoom levels rather than just the visibility.