Chromosome 20

  • Finished
  • 620 Sanger Clones
  • 9 External Clones
  • Neat data set
  • 59.15Mbp of sequence with zero N's

Analysed using standard human sequence analysis pipeline.

Annotated twice, and checked by Chr. 20 group.

Gene structures imported into EnsEMBL db.

EnsEMBL analysis pipeline used to run RepeatMasker and CpG island prediction.

EnsEMBL db used to generate the statistics for the paper.

Need to make fold-out figure of the chromosome for the paper. Chr. 22 was a big GIF. Decided to use PostScript, and to use Tk::Canvas to generate it, rather than writing raw postscript.



Perl Tk

Use Tk v8

Tried using the site installed Tk v4, but immediately found that I could not create a MainWindow via the OO interface:

    my $mw = MainWindow->new;

produces a fatal error.

Installed Tk v800.022, which does what it says.



Tk::Canvas

Drawing graphic items

Items are drawn on the canvas using the create<Type> methods.

    # Draw a red rectangle with no outline
    my $rec = $canvas->createRectangle(
        $x1, $y1, $x2, $y2,
        -fill       => 'red',
        -outline    => undef,
        -tags       => [@tags],
        );

Coordinates are 0,0 in top left of canvas, and are floating points.

The return value, $rec is an integer. This identifier can be used to address the item, and, for instance, change its size or colour.

Tags, which are strings, can be added to items, and are used to group items. In many Canvas methods, they can be used interchangeably with the integer identifiers of the items.

Canvas has lots of useful methods, eg:

    # Find the bounding box of all the
    # items with the tag "gene"
    my @bbox = $canvas->bbox('gene');

Image data

We have about 90,000 repeat features on Chr. 20. Didn't want to draw a separate rectangle for each feature, because they would be too small, and would produce too big a PostScript file.

Created images with Lincoln Stein's GD module, where each pixel is coloured according to the density of repeats under it.

Tried to place images without using an intermediate file:

    my $image = $canvas->Photo(
        '-format'   => 'gif',
        -data       => $gd->gif,
        );
    $canvas->createImage(
        $x, $y,
        -anchor     => 'nw',
        -image      => $image,
        );

This fails. The reason, which isn't in the documentation, is that the data needs to be base64 encoded:

    use MIME::Base64 'encode_base64';
    my $image = $canvas->Photo(
        '-format'   => 'gif',
        -data       => encode_base64($gd->gif),
        );

Scaling

    $mw->scaling(1);

X displays have a resolution which is set accoring to their size. This affects the size of the fonts relative to the other items in the PostScript output from Tk::Canvas, but setting scaling to 1 (1 pixel equals 1 point) fixes this.

You need to call scaling before anything gets drawn on the screen.

What's Missing?

Biggest feature missing is the ability to rotate Canvas items. Particularly useful for text.



GenomeCanvas

GenomeCanvas is a set of modules for drawing an overview of a genomic region.

Dotted arrows show inheritance, solid arrows references.

CanvasWindow::MainWindow

    my $mw = CanvasWindow::MainWindow->new;

CanvasWindow::MainWindow, inherits from MainWindow.

Sets default background colour, X resources, key bindings and scaling.

GenomeCanvas

    my $gc = GenomeCanvas->new($mw);
    my $canvas = $gc->canvas;

GenomeCanvas is the container object for the other modules in the system.

The new method packs a scrolled Canvas into the Toplevel widget given as the argument, which is then returned by the canvas method.

It is a container for a number of GenomeCanvas::Band objects, which are used to draw rows on the Canvas.

    $gc->add_Band($band);

Objects are drawn on the Canvas by calling the render method.

    $gc->render;

GenomeCanvas::Band

There is a different GenomeCanvas::Band module for each type of row on the diagram (eg: GenomeCanvas::DensityBand::RepeatFeature), which all inherit from the GenomeCanvas::Band base class.

Each Band module implements a render method, which creates the items on the Canvas. This is called by the containing GenomeCanvas object on each band in turn.

The GenomeCanvas object also gives each Band a tag, which the Band attaches to everything it draws.

After calling the render on a band, the GenomeCanvas finds the bounding box of the items the band has drawn, and calculates the y-axis offset for the next band.

Bio::EnsEMBL::Virtual::Contig

Each band typically holds a reference to an EnsEMBL VirtualContig object.

This is a convenient data structure, which presents sequence features in arbitrary genomic coordinates via a number of methods.

    my @repeats = $vc->get_all_RepeatFeatures;

GenomeCanvas::State

Both GenomeCanvas and GenomeCanvas::Band objects inherit from GenomeCanvas::State.

State is used to hold information common to the whole GenomeCanvas, such as the current y-axis offset, and a reference to the Canvas object.

The State is actually held in a single anonymous hash, which everything in a GenomeCanvas has a reference to. Because everything inherits from the GenomeCanvas::State class, the same methods can be called on each object:

    # GenomeCanvas sets the y-axis offset
    $gc->y_offset(203);
  ...
    # Which is retrieved by the Band object
    my $y_offset = $band->y_offset;


EnsEMBL

GenomeCanvas can be pointed at any EnsEMBL database.

The code will be released as an EnsEMBL CVS module, probably ensembl-tk.