Performance issues with Big Data

Background

Historically ZMap has requested all its data sequenctially fromACEDB and more recently this has been expedited by the avdent of pipe Servers which allows data from different columns to be requested in parallel and displayed piecemeal. This allows low volumne columns to be displayed quickly and allow the user to interact with ZMap while high volume columns are loading (Caveat: there is a noticable speed reduction due to processor use).

Data volumnes have increased and we expect these to continue to increase at an accelerating rate. To provide a usably fast response it it probably not feasable to continue with the current strategy which is to display all the data and allow fast scrolling via X. Note that the existing implementation using the foo canvas appears not to provide this fast scrolling (via an internal static bitmap) and instead draws features via an X expose event.

One strategy is to display summary data for high volume columns and while this may be a good solution at some point we have to display the actual features.

Problems to solve

Initial display can be very slow

This is because every feature loaded is displayed even if not visible due to being hidden by others. If there are 100k features in a column then the same picture can be generated by about 1000 features.

Zoom RevComp and 3-Frame can be very slow

As for the initial display this is due to repainting every feature in the display.

Moving the cursor seized up

This was due to the foo canvas passing the cursor motion event to every canvas group, resulting in eg 600k function calls for each event. This has been commented out.

The Ruler and lasso can become unusably slow with high volume data

These are implemented by adding foo canvas items and in the current implementation this implies a repaint of all canvas groups overlapping the areas involved. For the ruler this means a line across the whole window for the old and new position, and subject to X behaviour could result in an expose event to all groups between the two. For the lasso there will be an expose event for the larger of the old and new rectangles.

It's obvious the we have to prevent the expose events from repainting features and using some kind of bitmap to hold the image or persuading X to blit the ruler and lasso will solve this.

Scroll is implemented by redrawing items in the canvas

We expect that visible scrolled items will not be re-painted but the previously hidden ones are. Using a bitmap to hold the image will cure this but at the expense of having to paint it in the first place.

Problems to solve - a technical view

What actually takes the time?

The basic problem is that it takes a long time to display a lot of features and the current strategy is to display all the features ZMap has and allow fast scrolling via bitmap blitting (which does not happen). All the problems to solve can be expressed in these terms but there are a number of aspects to consider:

Adding foo canvas data structures may be slow

Items are added to groups and the groups store these in a Glist. The known performance problem with g_list_append has been addressed (see foo-canvas.c/group_add()), So other than operating foo data this is not thought to be relevant.

Foo canvas use of GObjects may be slow

Painting foo canvas items may be slow

Accessing items to paint may be inefficient

This is certainly true. Due to the use of the foo canvas by ZMap (data arranged in column groups and stored unsorted) if we want to paint items in a small section of the column then these are searched for using GObject methods and every group in the column is asked if it it is in the region. As every ZMapCanvasItem is a group then we search every feature which will run in O(n) time rather than a more reasonable O(log n). We know that this is a real performance problem as this was the cause of the cursor seizing up - this used the GObject signaling mechanism and this implies a double overhead - one function call to send the signal and another to receive it for every item in the column. Note that reverting ZMapCanvasItems to be simple foo_canvas_items will not affect this much as groups have to query each item.

Things to measure

The general approach will be to take exising code using a high volume column and selective remove various functions while timing operations. Attempts using vtune for example have proved useful but are swamped by glib internbal routines and apply to a complete run of zmap and therefore provide imprecise data.

We will use a large data set (eg 170k trembl features) so that non-linear functions appear more significant.

Effort required for various approaches

Measurements

memory used by canvas items

Data sizes

gtkobject 16
foocanvasitem 56
foocanvasgroup 80
zmapcanvasitem 108
zmapwindowalignmentfeature 116
zmapwindowcontainergroup 156
Currently for unbumped alignments we have 116 bytes plus 56 for a single basic canvas item (172 bytes) which for 100k features requires 17.2MB. If we change these to be zmap alignment canvas items (based on foo_canvas_item plus whatever extra data is needed then we need:
foocanvasitem 56
zmapcanvas item 108 - 80 - 16 = 12  (16 bytes saved due to not needing 4 lists of canvas items)
ie 68 bytes, resulting in a saving of 10.4 MB

Data in test set (using ZMap_toby and ace_toby on ~mh17)

General
      Program: ZMap - 0.1.110

      User: mh17 (Malcolm Hinsley)

      Machine: deskpro18979

      Sequence: chr9-03_86271101-87038365

Session Statistics
chr9-03_86271101-87038365:    Context children: 0, canvas children: 0
chr9-03_86271101-87038365:    Context children: 0, canvas children: 0
chr9-03_86271101-87038365:    Context children: 0, canvas children: 0
Novel_CDS:  Context children: 2, canvas children: 2
Transcript  features:1  exons:11,   introns:10, cds:1 boxes:0     exon_boxes:11     intron_boxes:20   cds_boxes:1
Transcript: Context children: 0, canvas children: 0
Genscan:    Context children: 8, canvas children: 8
Transcript  features:4  exons:22,   introns:18, cds:4 boxes:0     exon_boxes:22     intron_boxes:36   cds_boxes:4
Halfwise:   Context children: 16, canvas children: 16
Transcript  features:8  exons:30,   introns:22, cds:8 boxes:0     exon_boxes:30     intron_boxes:44   cds_boxes:8
Genomic_canonical:      Context children: 100, canvas children: 100
Basic features:50  boxes:50
Novel_CDS:  Context children: 0, canvas children: 0
Transcript: Context children: 2, canvas children: 2
Transcript  features:1  exons:20,   introns:19, cds:0 boxes:0     exon_boxes:20     intron_boxes:38   cds_boxes:0
Genscan:    Context children: 8, canvas children: 8
Transcript  features:4  exons:30,   introns:26, cds:4 boxes:0     exon_boxes:30     intron_boxes:52   cds_boxes:4
Halfwise:   Context children: 18, canvas children: 18
Transcript  features:9  exons:44,   introns:35, cds:9 boxes:0     exon_boxes:44     intron_boxes:70   cds_boxes:9
3 Frame Translation:    Context children: 0, canvas children: 0
trf:  Context children: 2890, canvas children: 2890
Basic features:126       boxes:126
Alignment   features:144      gapped:0    not perfect gapped:0    ungapped:144      boxes:144   gapped boxes:0    ungapped boxes:144      gapped boxes not drawn:0
Alignment   features:698      gapped:0    not perfect gapped:0    ungapped:698      boxes:698   gapped boxes:0    ungapped boxes:698      gapped boxes not drawn:0
Alignment   features:477      gapped:0    not perfect gapped:0    ungapped:477      boxes:477   gapped boxes:0    ungapped boxes:477      gapped boxes not drawn:0
CpG:  Context children: 26, canvas children: 26
Basic features:5   boxes:5
Basic features:8   boxes:8
GF_coding_seg:    Context children: 0, canvas children: 0
GF_ATG:     Context children: 0, canvas children: 0
GF_splice:  Context children: 0, canvas children: 0
SwissProt:  Context children: 23884, canvas children: 23884
Alignment   features:11942    gapped:0    not perfect gapped:0    ungapped:11942    boxes:11942 gapped boxes:0    ungapped boxes:11942    gapped boxes not drawn:0
TrEMBL:     Context children: 402232, canvas children: 402232
Alignment   features:201116   gapped:0    not perfect gapped:0    ungapped:201116   boxes:201116      gapped boxes:0    ungapped boxes:201116   gapped boxes not drawn:0
EST_Human:  Context children: 2360, canvas children: 2360
Alignment   features:1180     gapped:0    not perfect gapped:0    ungapped:1180     boxes:1180  gapped boxes:0    ungapped boxes:1180     gapped boxes not drawn:0
EST_Mouse:  Context children: 1278, canvas children: 1278
Alignment   features:639      gapped:0    not perfect gapped:0    ungapped:639      boxes:639   gapped boxes:0    ungapped boxes:639      gapped boxes not drawn:0
EST_Pig:    Context children: 6920, canvas children: 6920
Alignment   features:3460     gapped:0    not perfect gapped:0    ungapped:3460     boxes:3460  gapped boxes:0    ungapped boxes:3460     gapped boxes not drawn:0
EST_Other:  Context children: 4806, canvas children: 4806
Alignment   features:2403     gapped:0    not perfect gapped:0    ungapped:2403     boxes:2403  gapped boxes:0    ungapped boxes:2403     gapped boxes not drawn:0
vertebrate_mRNA:  Context children: 5988, canvas children: 5988
Alignment   features:2994     gapped:0    not perfect gapped:0    ungapped:2994     boxes:2994  gapped boxes:0    ungapped boxes:2994     gapped boxes not drawn:0
Saturated_SwissProt:    Context children: 258, canvas children: 258
Basic features:129       boxes:129
Saturated_TrEMBL: Context children: 3292, canvas children: 3292
Basic features:1646      boxes:1646
Saturated_EST_Human:    Context children: 280, canvas children: 280
Basic features:140       boxes:140
Saturated_EST_Mouse:    Context children: 186, canvas children: 186
Basic features:93  boxes:93
Saturated_EST_Pig:      Context children: 1166, canvas children: 1166
Basic features:583       boxes:583
Saturated_EST_Other:    Context children: 796, canvas children: 796
Basic features:398       boxes:398
Saturated_vertebrate_mRNA:    Context children: 560, canvas children: 560
Basic features:280       boxes:280
DNA:  Context children: 1, canvas children: 1
Basic features:0   boxes:0
Locus:      Context children: 12, canvas children: 12
Basic features:6   boxes:6

So for Trembl (fwd strand) we have 201116 features (and ungapped boxes) and the context and canvas have 2x that number of children (not sure what's going on there). Note that to avoid complexity the styles were changed to place all Trembl features in one strand - the feature set would otherwise be displayed in two columns. Note that at a rough estimate there are 26000 other features, so Trembl accounts to approx 90% of the data.

Why are there so many features and canvas items?

There appears to be approximately 3x as many canvas groups as there are features, which is odd, these statistics are incremented when the Item factory is run, once per canvas item/ feature displayed.

Timing display of the Trembl column

Base measurement - elapsed time in normal operation

Start 114.975     Merge Context
Stop  115.100     Merge Context
Start 115.102     DrawBlock
Stop  115.103     DrawBlock

...

Start 116.611     DrawFeatureSet    trembl
Start 116.611     DrawFeatureSet    ProcessFeature
Stop  134.304     DrawFeatureSet    ProcessFeature
Start 134.304     DrawFeatureSet    Bump
Stop  134.304     DrawFeatureSet    Bump
Start 134.304     DrawFeatureSet    SetState
Stop  149.438     SetVis      true
Stop  149.438     DrawFeatureSet    SetVis
Stop  149.438     DrawFeatureSet    SetState
Stop  149.446     DrawFeatureSet    trembl

...

Stop  154.382     DrawFeatureSet    SetVis
Stop  154.382     DrawFeatureSet    SetState
Stop  154.382     DrawFeatureSet    saturated_est_mouse
expose complete: 0 items picked, 681258 groups drawn

ie 115 seconds to load data from ACEDB, 39 seconds to display it all, out of which 33 was needed by Trembl. From this it is clear that displaying a column is approximately linear in terms of the number of features.

More detail - including foo canvas operations

As we can see from this the picture is more complicated for the initial canvas display. There is a process known as updating that appears to set the extents of all canvas groups and this is done via an idle callback if needed, but also triggered by the expose handler if it is pending. In this example it is done before the expose. Note that there are two calls - one for the navigator and one for the main canvas.

Start 0.132 canvas_expose     draw
Stop  0.132 canvas_expose     draw
expose complete: 0 items picked, 1 groups drawn
Start 0.132 canvas_expose     draw
Stop  0.132 canvas_expose     draw
expose complete: 0 items picked, 4 groups drawn
Start 0.231 do_update
Stop  0.231 do_update
Start 92.876      Merge Context
Stop  92.999      Merge Context
Start 93.001      DrawBlock
Stop  93.002      DrawBlock

Start 94.622      DrawFeatureSet    trembl
Start 94.622      DrawFeatureSet    ProcessFeature
Stop  112.091     DrawFeatureSet    ProcessFeature
Start 112.091     DrawFeatureSet    Bump
Stop  112.091     DrawFeatureSet    Bump
Start 112.091     DrawFeatureSet    SetState
Stop  130.262     SetVis      true
Stop  130.269     DrawFeatureSet    SetVis
Stop  130.269     DrawFeatureSet    SetState
Stop  130.276     DrawFeatureSet    trembl

Start 134.888     do_update
Stop  140.436     do_udate
Start 140.505     do_update
Stop  140.505     do_udate
Start 140.751     canvas_expose     draw
Stop  170.883     canvas_expose     draw
expose complete: 0 items picked, 681258 groups drawn
Start 170.883     canvas_expose     draw
Stop  170.914     canvas_expose     draw
expose complete: 0 items picked, 4 groups drawn
This gives us:
loading data 92 sec
display all 41 sec
display Trembl 36 sec
(process feature) 18 sec
(set column state) 18 sec
foo update all 6 sec
foo draw all 30 sec

Set column state takes 18 seconds -half the time needed to add features to the foo canvas. This appear to be a result of zmapWindowContainerFeatureSetAugment() setting the column state to not visible which then requires the column to be made visible after adding the features. Some high volume columns (eg GF_Splice) are hidden by default and perhaps ZMap could make a better guess of the intended state on column creation. It would also be possible to not display a column if it is not intended to display it (eg like 3-Frame).

Removing the canvas item creation from the display code results in 270ms being needed to drive everything else for the Trembl column, and we could loose 40 seconds by handling all this in the ZMap code, if we chose to draw features from the context of a column container object.

Exposing a single column: all features or just a few

This was done by hiding all columns except Trembl and placing the columns dialog over the Trembl column, and the minimising it, and the process repeated but only exposing a small number of pixel rows.

Whole column to be displayed

Start 556.541     canvas_expose     draw
Stop  579.675     canvas_expose     draw
expose complete: 0 items picked, 603359 groups drawn
Start 579.701     canvas_expose     draw
Stop  580.297     canvas_expose     draw
expose complete: 0 items picked, 12626 groups drawn

Hardly any of the column to be displayed

Start 937.394     canvas_expose     draw
Stop  937.420     canvas_expose     draw
expose complete: 0 items picked, 26 groups drawn
Start 937.454     canvas_expose     draw
Stop  937.483     canvas_expose     draw
expose complete: 0 items picked, 146 groups drawn
Start 937.485     canvas_expose     draw
Stop  937.485     canvas_expose     draw
expose complete: 0 items picked, 1 groups drawn
Start 937.506     canvas_expose     draw
Stop  937.506     canvas_expose     draw
expose complete: 0 items picked, 1 groups drawn

Slightly more groups to draw

Start 370.945     canvas_expose     draw
Stop  370.971     canvas_expose     draw
expose complete: 0 items picked, 11 groups drawn
Start 370.996     canvas_expose     draw
Stop  371.197     canvas_expose     draw
expose complete: 0 items picked, 6167 groups drawn
Start 371.200     canvas_expose     draw
Stop  371.200     canvas_expose     draw
expose complete: 0 items picked, 1 groups drawn
Start 406.691     canvas_expose     draw
Stop  406.691     canvas_expose     draw
expose complete: 0 items picked, 1 groups drawn

So this gives us:

Groups Time Time per group
603359 groups 23 sec 0.004 ms
6167 groups 0.2 sec 0.019 ms
146 groups 0.029 sec 0.198 ms
and to search all the groups appears to take about 197 ms.
What performance improvement is possible?

In rough terms the above stats suggest that to display data from the feature context via the foo canvas takes approx 12 seconds per 100k features. (36 + 36 / 600k). 25% of this is through adding data to the foo canvas, 25% through setting show/hide status, 8% doing a foo-update and 42% drawing the data via GDK.

Setting column visibiity state correctly on creation

This should save 25%; we can also optimise the Drawfeatures code to not draw columns that are not expected to be visible, which should save mode time for the initial display but require extra time of columns are shown later. This should be acceptable for user controlled show/hide, but may be irritating for columns configured to be hidden at certain zoom levels. Note that it is only relevant to optimise columns with large amounts of data.

Display a column summary

Using column summarise (same picture) or specific summary styles (heatmaps/ graphs) for low zoom + high volume columns for the initial display this allows a maximum time need to be determined as for each column there is never a need to display more features than there are vertical pixels If we assume 40 columns and display 1000 pixels tall that gives us a worst case of 40k features, which would take about 5 seconds, but in practice we would expect this to be much faster as most columns will not have that much data. (in our sample data we have 8 columns with more than 1000 features). It may be easiest to control this via styles config.

Note that this approach has some downside:

If we assume that for the initial Zmap display we only have to display 20k features, then we require only approx 2.5 seconds using existing foo canvas technology, no matter how many features some columns have.

Amending the foo canvas to not require the 'update' process

Is this possible or advisable? Let's guess 5% improvement possible - further investigation is needed

Providing our own pixmaps for display/ displaying features as part of a column

If we construct our own image in a pixmap (or several) and display these via the foo canvas then there will be no noticable performance issues from the foo canvas (assuming that pixmaps will operate efficiently) as the number of display items will be small (eg less than 100).

This would allow us to remove almost entirely the step of adding items to the foo canvas as all we need to do is to create a mapping from feature context to pixmap - this will be equivalent to the FToIhash operations already in place. The draw process would then consist of writing to the pixmap and triggering a GDK paint. Note that this implicitly treats the display as a representation of the data and the concept of searching the canvas is not relevant.

This would give 24% from not operating the foo canvas except for minimal numbers of items, and 8% from not operating the update process. Gains in drawing should also be possible - a simple test using a glyph drawing function shows that displaying 10k items to a gdk_drawable takes 150ms for background and 120 ms for outline - if we have 600k items to display that would take 15 seconds approx, which is half the time spent currently. Obviously with an off screen drawable (pixmap) there can be less interaction with X and items can be draw as a batch. Note that if we adopt the policy of only displaying what needs to be displayed it is unlikely that we will have to display significantly more than 10 items and it should be possible to engineer a ZMap where all display operations take less than 1 second, even without resorting to off-screen pixmaps.

It should also be possible to lose the need for long items code.

NOTE that if we are to avoid continually redrawing features whenever another column is displayed then we must implement something like this. See here for a discussion of how to do this with minimum effort.

Getting the foo canvas to operate pixmaps

This (or Zmap supplied pixmaps) is essential to allow efficient scrolling. It is also essential to allow columns to be hidden or shown or moved without repainting half the screen - Either we have one foo canvas per column or one pixmap per column.

Painting on demand

Using well known graphics techniques of displaying a current view and having adjacent view already prepared (eg like google maps) we can provide smooth scrolling using pixmaps without having to display all the data at high zoom. It would be desirable/ necessary to implement a display thread to paint to a cache of pixmaps, but this would unlink control and view and provide a more comfortable user experience. Note that we move much of the drawing operations to idle CPU time as we can paint adjacent regions while the user is doing soemthing else, and the appearance will be of a much faster operation.

This strategy would allow us to paint only a subset of the data at high zoom and would therefore speed up zoom and revcomp considerably.

Resolving unanswered questions re data volumes

Investigations reveal that with approx 230k features we end up with 680k foo canvas groups and there is clearly something to explain.

Optimisation: A Summary

Initial display

Currently, to display 100k canvas items requires 12 seconds
By setting column state appropriately we can reduce this to 9 seconds

If we create and display our own pixmaps instead of displaying features direct on the foo canvas then this gains us another 3 seconds. Tests need to be made to determine if using pixmaps is a workable strategy, and what performance gains we can make in the drawing process.

By displaying summary data rather than all the features for high volume columns we can set a practical limit (related to the number of columns with more than 1000 features) of approx 2.5 seconds (see above for assumptions), and with the improvements above this would be halved (1.25 seconds). Compared with 72 seconds this is a speed up of 56x. Thsi would require little change to the existing canvas operation but would require new styles (as already being designed for heatmaps etc) and sundry changes to some ZMap code.

With data loaded via pipes the initial 2 minutes of 'Data Loading' will effectively disappear and many columns will be displayed very quickly, with Trembl still taking 2 minutes to arrive. Note that to avoid subsequent long delays on requesting other columns it is essential to prevent repainting of high volume columns, either by using summary data or by implementing pixmaps per column.

Zoomed in

Currently all or as much of the feature context as is possible is displayed whenever a zoom (or revcomp etc) is selected. We could opt to set the canvas size to the full size and only paint around the visible region, which would reduce drawing time significantly, depending on the zoom level.

At high zooms summary data is not possible or useful and we cannot achive any gains from this. However as we know that the number of visible features is small we could operate a paint on demand policy and provide usable worst case performance. Tests on a smaller data set show 120k groups painted in 1.7 seconds.

Results
What Initial display RevCompComments
Correct column state 39% 26% Better than expected - previous code redisplayed data even of the old and new visibility state was the same.
Note: we often get a double expose, and one cause is thought to be the title bar expanding as seq coordinates are shown. It should be possible to prevent this by displaying copordinates correctly before painting features but note that this is relevant only to ACEDB only configurations as with a pipe server configuration the sequence coordinates will be set by the first column loaded regardles of whether or not we can determine these from the config file.
Implementing pixmaps in the foo canvas

Foo canvas pixbufs

The foo canvas has support for pixbufs which appear to be static images. A cursory scan og GDK pixbuf documenation give the impression these are not much used these days, and arfe authored by the same person as the foo canvas. There is a library which is 11 years old.

GDK pixmaps

These are Drawable objects and can be treated as off screen windows and look like a much better option - existing code can be used to draw on them. Note that they are already deprecated and we are advised to use Cairo instead, but given that that applies to most of the foo canvas this is hardly an issue. (NB we would not be able to upgrade to GTK 3).

One pixmap per column

Without changing the overall structure of the ZMap code it should be possible to create a pixmap per column and have existing code paint features on it. We would have to add in an extra layer to place pixmaps onto the canvas, but then scrolling should be instant and re-ordering and moving columns would not require features to be repainted, which will become important very soon when pipe servers make it out into the real world.

Work needed

Here we try to preserve existing code and data with the aim of inserting pixmaps with the minumum code written/ changed.

Speeding up the lasso and ruler

These are drawn as foo canvas items and require a redraw of the whole region when changed: ie they are not blitted. Without changing code if we operate pixmaps as the display technology this will be significantly faster.

Another way to speed up the ruler would be to implement a 1-pixel deep tooltip (if possible), assuming that this would be displayed by X with bit-blitting. (if not then no performance improvement).

Resolving issues on focus code

Currently when a column is selected the Canvas items are sorted and a selected feature is highlit a) by changing the colour and b) moving it to the start of the column's list of features. When unhighlit it is moved back into place, but with multiple highlight this is a problem. Highlit items can be displayed on top of pixmaps in the window's foo canvas and this provides a way to add or remove highlight without affectng the order of features in the canvas. These features would be flagged to not respond to mouse events and pass them onto the underlying pixmap/ foo canvas combo.