Building/Installing an Ensembl Website

The installation can be split into two sections, non-Ensembl and Ensembl. The first of these involves the installation of applications/modules that are not part of the Ensembl project but are necessary for the web site to work, such as Apache and MySQL. The second involves installing the Ensembl data, modules and web site code.

Non-Ensembl Applications Build/Install

Ensembl is built on the following applications:

These applications are not version-specific for Ensembl; that is, if you upgrade your Ensembl installation to a newer version when one becomes available, you probably won't need to install new versions of these applications.

All of this software, like all of Ensembl, is Open Source software and can be downloaded and used free of charge. You should, however, check the documentation for each application to see what license it has been released under, particularly if you are installing Ensembl in a commercial environment.

The following instructions assume you have root access to the installation machine. If you do not, get your systems administrator to install this software for you.

You may have some or all of this software installed already. If you have any problems getting the site running with pre-installed software (in particular Apache with mod_perl installed from RPMs), we recommend simply installing the latest version using the following instructions.

CVS

CVS is a software version control system that we use for storing the source code to Ensembl. You will need CVS installed if you want to download the Ensembl source code. It will also help you keep up to date with any bug fixes. We also have a Web-based CVS repository.

To install CVS:

  1. Download the latest source from http://ftp.gnu.org/non-gnu/cvs/source/stable/1.11.22/ (you may be able to download binaries from here instead if you prefer). At the time of writing this is version 1.11.22 and the file to download is cvs-1.11.22.tar.gz

  2. Unpack the source in a working directory with:

    gunzip < cvs-1.11.22.tar.gz | tar xvf - cd cvs-1.11.22
    ./configure
    make
    make install

Perl

You will probably already have Perl installed. You need Perl5, version 5.8.0 or higher to run the website. We run the site under Perl 5.8.7. To see if you have Perl installed, and/or to check it's version number, type:

perl -v

If you don't have Perl installed, or need to upgrade, go to http://www.cpan.org/ and choose the 'source code' install. Follow the installation instructions on the web site.

SQLite

MySQL

MySQL is a very popular Open Source relational database system. The easiest way to install MySQL is to use the pre-compiled binaries from http://dev.mysql.com. You can also get source from http://dev.mysql.com if you wish to compile MySQL yourself.

To install MySQL:

  1. Download the appropriate standard binaries from http://dev.mysql.com/downloads/mysql. Get the current stable version - at the time of writing, this is 4.1.12.

  2. Create a directory for MySQL to be installed into. A subdirectory of this will hold the databases, so choose somewhere that has sufficient space free - at least 170 GB for the complete set. We will use /data/ as an example. Again, when following these instructions, replace /data/ with whatever path you choose.

  3. Move the binary tarball to /data/

  4. Unpack the tarball with:

    $ gunzip < mysql-WHATEVER.tar.gz | tar xvf -

    Follow the straightforward setup instructions in the INSTALL-BINARY file that comes with MySQL. It should be located in the "mysql-WHATEVER" directory you just unpacked.

Apache & mod_perl

Apache is the web server that the Ensembl site runs on. mod_perl is a module for Apache that enables it to compile perl scripts once rather than each time they are requested, and so makes everything run a lot faster.

Please follow these instructions precisely as often the default version of Apache or mod-perl does not work correctly for Ensembl.

To Install Apache with mod_perl:

  1. Download the Apache2 source tarball from http://httpd.apache.org/dist/httpd/. Get the current stable version - at the time of writing, this is 2.2.4, and the file to download is httpd-2.2.4.tar.gz.

  2. Download the mod_perl source from http://www.cpan.org/modules/by-module/Apache2/ . Again, get the latest version, currently this is 2.0.3 and the file to download is mod_perl-2.0.3.tar.gz.

  3. Unpack all the sources in a working directory with:

    tar zxf httpd-2.2.4.tar.gz
    tar zxf mod_perl-2.0.3.tar.gz | tar xvf -
    cd httpd-2.2.4
    ./configure --enable-deflate --prefix=Apache directory
    cd ../mod_perl-2.0.3

    The httpd.conf files etc assume you install this in an apache2 subdirectory of your websites server root.

  4. Build the perl makefile:

    perl Makefile.PL PREFIX=Apache directory MP_APXS=Apache directory/bin/apxs
  5. Run the 'make' utility:

    make
  6. ...and install

    make install

Perl modules

The Ensembl website needs quite a few Perl modules to be installed in order for it to run. These modules can all be downloaded from www.cpan.org, and are all installed in much the same way: Download the module tarball, unpack in a working directory, and install the module:

gunzip < module.tar.gz | tar xvf -
cd module
perl Makefile.PL
make
make test
make install

The modules that are required are listed below, along with their URLs. The file part of the URL is current at the time of writing - you should install whatever is the latest version of the module.

CGI

Enables Perl scripts to easily parse cgi

http://www.cpan.org/modules/by-module/CGI/CGI.pm-3.27.tar.gz
NOTE: you need to install the latest version to make sure it is compatible with mod_perl2

Compress::Zlib

A compression module for DAS.

http://www.cpan.org/modules/by-module/Compress/Compress-Zlib-1.34.tar.gz

Compress::Raw::Zlib

A compression module for Mart.

http://www.cpan.org/modules/by-module/Compress/Compress-Raw-Zlib-2.003.tar.gz

DBI

A common database interface for Perl

http://www.cpan.org/modules/by-module/DBI/DBI-1.54.tar.gz

DBD::Mysql

The MySQL drivers for the DBI interface

http://www.cpan.org/modules/by-module/DBD/DBD-mysql-4.001.tar.gz

GD

A Graphics library

Note: may require additional modules. Please read install docs.

http://www.cpan.org/modules/by-module/GD/GD-2.35.tar.gz

ParallelUserAgent

Allows for parallel requests

http://www.cpan.org/modules/by-module/LWP/ParallelUserAgent-2.57.tar.gz

Bio::Das::Lite

Lightweight DAS fetcher

http://www.cpan.org/modules/by-module/Bio/Bio-Das-Lite-1.48.tar.gz

Data::UUID

Creates a unique ID

http://www.cpan.org/modules/by-module/Data/Data-UUID-0.11.tar.gz

Digest::MD5

calculates a unique checksum for a file

http://www.cpan.org/modules/by-module/Digest/Digest-MD5-2.33.tar.gz

Storable

used to store and restore data structures (part of standard Perl distribution)

http://www.cpan.org/modules/by-module/Storable/Storable-2.13.tar.gz

LWP

used by DAS to communicate with DAS sources

Note: may require additional modules. Please read install docs.

http://www.cpan.org/modules/by-module/LWP/libwww-perl-5.805.tar.gz

SOAP::Lite

used by DAS to communicate with DAS sources

Note: may require additional modules. Please read install docs.

http://www.cpan.org/modules/by-module/SOAP/SOAP-Lite-0.60a.tar.gz

XML::Parser

used by DAS to parse DAS sources

Note: module is a wrapper around the expat library, which should be installed first:

Download the expat source from http://sourceforge.net/projects/expat/. Get the latest version (currently expat-1.95.8.tar.gz). Run the following commands:

gunzip < expat-1.95.8.tar.gz | tar xvf -
cd expat-1.95.8
./configure
make
make install

http://www.cpan.org/modules/by-module/XML/XML-Parser-2.34.tar.gz

XML::Simple

used by Mart to parse configuration

http://www.cpan.org/modules/by-module/XML/XML-Simple-2.14.tar.gz

Parse-RecDescent

used by Excel exporter

http://www.cpan.org/modules/by-module/Parse/Parse-RecDescent-1.94.tar.gz

PDF::API2

used by Image exporter for exporting as PDF

http://www.cpan.org/modules/by-module/PDF/PDF-API2-0.40.86.tar.gz

Spreadsheet::WriteExcel

used for exporting Excel spreadsheets

http://www.cpan.org/modules/by-module/Spreadsheet/Spreadsheet-WriteExcel-2.12.tar.gz

OLE::Storage_Lite

Used for exporting Excel spreadsheets

http://www.cpan.org/modules/by-module/OLE/OLE-Storage_Lite-0.11.tar.gz

Time::HiRes

Used for code profiling

http://www.cpan.org/modules/by-module/Time/Time-HiRes-1.66.tar.gz

HTML::Template

Used by BlastView

http://www.cpan.org/modules/by-module/HTML/HTML-Template-2.7.tar.gz

File::Temp

Used by MartView

http://www.cpan.org/modules/by-module/File/File-Temp-0.14.tar.gz

Mail::Mailer

Used by MartView

http://www.cpan.org/modules/by-module/Mail/MailTools-1.66.tar.gz

Math::Bezier

Used by drawing code

http://www.cpan.org/modules/by-module/Math/Math-Bezier-0.01.tar.gz

IO::String

Used for sequence handling

http://www.cpan.org/modules/by-module/IO/IO-String-1.06.tar.gz

Image::Size

Used for getting size of images

http://www.cpan.org/modules/by-module/Image/Image-Size-2.992.tar.gz

PathTools

Used for architecture independent file path manipulation

http://www.cpan.org/modules/by-module/File/PathTools-3.09.tar.gz

version

Required by Class::Std

http://search.cpan.org/CPAN/authors/id/J/JP/JPEACOCK/version-0.68.tar.gz

DB_File

Used by Mart

http://www.cpan.org/modules/by-modules/DB_File/DB_File-1.814.tar.gz

CGI::Ajax

Used by BioMart

http://www.cpan.org/modules/by-modules/CGI/CGI-Ajax-0.697.tar.gz

CGI::Session

Used by BioMart

http://www.cpan.org/modules/by-modules/CGI/CGI-Session-4.14.tar.gz

Class::Accessor

Used by BioMart

http://www.cpan.org/modules/by-modules/Class/Class-Accessor-0.27.tar.gz

Class::Data::Inheritable

Used by BioMart

http://www.cpan.org/modules/by-modules/Class/Class-Data-Inheritable-0.06.tar.gz

Class::Std::Utils

Used by BioMart

http://www.cpan.org/modules/by-modules/Class/Class-Std-Utils-0.0.2.tar.gz

Class::Std

Used by BioMart and new session code

http://www.cpan.org/modules/by-modules/Class/Class-Std-v0.0.8.tar.gz

Devel::StackTrace

Used by BioMart

http://www.cpan.org/modules/by-modules/Devel/Devel-StackTrace-1.13.tar.gz

Exception::Class

Used by BioMart

http://www.cpan.org/modules/by-modules/Exception/Exception-Class-1.23.tar.gz

List::MoreUtils

Used by BioMart

http://www.cpan.org/modules/by-modules/List/List-MoreUtils-0.22.tar.gz

Log::Log4perl

Used by BioMart

http://www.cpan.org/modules/by-modules/Log/Log-Log4perl-1.07.tar.gz

Number::Format

Used by BioMart

http://www.cpan.org/modules/by-modules/Number/Number-Format-1.52.tar.gz

Readonly

Used by BioMart

http://www.cpan.org/modules/by-modules/Readonly/Readonly-1.03.tar.gz

Sub::Uplevel

Used by BioMart

http://www.cpan.org/modules/by-modules/Sub/Sub-Uplevel-0.14.tar.gz

Template::Plugin::Number::Format

Used by BioMart

http://www.cpan.org/modules/by-modules/Template/Template-Plugin-Number-Format-1.01.tar.gz

Template::Toolkit

Used by BioMart

http://www.cpan.org/modules/by-modules/Template/Template-Toolkit-2.15.tar.gz

Test::Exception

Used by BioMart

http://www.cpan.org/modules/by-modules/Test/Test-Exception-0.24.tar.gz

Test::Simple

Used by BioMart

http://www.cpan.org/modules/by-modules/Test/Test-Simple-0.66.tar.gz

XML::DOM

Used by BioMart

http://www.cpan.org/modules/by-modules/XML/XML-DOM-1.44.tar.gz

XML::RegExp

Used by BioMart

http://www.cpan.org/modules/by-modules/XML/XML-RegExp-0.03.tar.gz

libxml

Used by BioMart

http://www.cpan.org/modules/by-modules/XML/libxml-perl-0.08.tar.gz

Config::Inifiles

Used by the registry

http://www.cpan.org/modules/by-module/Config/Config-IniFiles-2.38.tar.gz

Dotter

The Ensembl website contains a page called DotterView which displays a graphical dotplot comparison of two sequences, using the application Dotter. If you wish to use this page on a local installation you will need a local Dotter binary:

Dotter is part of the AceDB package available at:

http://www.acedb.org/Software/Downloads/supported.shtml

Pre-built binaries are available for alpha/Tru64 unix and intel/linux. All other flavours will have to be built from sources.

The acedb code makes use of the GTK graphics and GNU readline packages. If you don't have these on your system you can either install them yourself or use the copies distributed with the acedb source.

To install them go to

www.gtk.org

for the GTK package and

ftp://ftp.cwru.edu/pub/bash/readline-4.3.tar.gz

for readline.

Follow the GTK and GNU instructions for installing these packages.

If you want to use the versions distributed with the acedb source then follow the instructions for making them in the next section.

Next ungzip/untar the acedb package into an temporary directory:

mkdir acedb
mv ACEDB-source.4_9l.tar.gz ./acedb/
cd ./acedb
gzip -d < ACEDB-source.4_9l.tar.gz | tar xvf -

Set an environment variable to tell the build process what platform you are building on - one of:

ALPHA_4 ALPHA_4_GCC ALPHA_4_GCC_OPT ALPHA_4_LINUX ALPHA_4_OLDSTYLE ALPHA_4_OPT ALPHA_5 ALPHA_CHRONO_4
DARWIN_4
FreeBSD
FUJITSU_4
HP_4 HP_4_GCC HP_4_OPT
IBM_4_3 IBM_4
IRIX4_4
LINUX_4 LINUX_MAC_4
MACOSX_4_DEF
NEC_4 NEC_4_R10 NEC_4_R11
NEXT_4
POSIX_4 POSIX_4_GCC
SGI_4 SGI_4_GCC SGI_4_IRIX5 SGI_4_PURE SGI_5_GCC SGI_65_GCC_DEF SOLARIS_4 SOLARIS_4_OPT SOLARIS_4_RELEASE SUN_4_DEF WIN32_4

For example (in csh):

setenv ACEDB_MACHINE SOLARIS_4

Consult the AceDB documentation if you need more help.

If you wish to use the GTK/readline packages distributed with acedb you should build them before building any acedb code:

make gnulibs

To build the dotter binary issue the command:

make dotter

(If you just run "make" you will build the whole of AceDB - which is unnecessary).

The build will place the executable at:

./bin.${ACEDB_MACHINE}/dotter

Check that the binary runs properly and then copy it to your Ensembl shared binaries directory. DotterView should now work.

Ensembl Build/Install

This section explains how to install the Ensembl data, Perl modules, and web code. It also covers the installation of BioPerl.

Versioning

Each Ensembl release has an integer version number associated with it, and that version number is used to identify the correct versions of API, Web code and databases (see below) that make up that release.

For the API and Web code, a CVS branch (essentially a named snapshot of the code) is made for each release, named with the release version number. The current release is version [[SPECIESDEFS::ENSEMBL_VERSION]], and the CVS tag for identifying the API and Web code for this release is 'branch-ensembl-[[SPECIESDEFS::ENSEMBL_VERSION]]'.

The Ensembl database names consist of the species, the database type, the release number, and the data version. The current human 'core' database is named homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]]. i.e. a human core database, release [[SPECIESDEFS::ENSEMBL_VERSION]], data version [[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] built on the NCBI 36 assembly. Subsequent data releases on the same assembly are suffixed with a lower case letter (a, b, etc.).

The idea is that components with the same release version should work together - i.e. any web site built with 'version 31' API and web code, and 'version 31' databases should work correctly.

Installing the Ensembl Data

The Ensembl data is provided on the Ensembl FTP site in the form of tab-delimited text files for importing into MySQL. ftp://ftp.ensembl.org/pub contains a directory for each release for each species. The latest versions are named current_species, e.g. current_human, current_mouse, etc. The directory structure below this is as follows (using current_human as an example):

ftp.ensembl.org/pub/current_human
    |-- data
|-- fastacDNA, DNA (masked and unmasked chromosome sequence dumps), RNA and peptide dumps
|-- flatfilesEMBL and GenBank format dumps
|-- mysqlDatabase dumps

The mysql directory contains a directory for each database. This can be used to install your own copy of the Ensembl data, e.g.:

ftp.ensembl.org/pub/current_human/data/mysql
    |-- homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] core ensembl database
|-- homo_sapiens_otherfeatures_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] other features database
|-- etc...

Each database directory contains a data file for each table in that database an SQL file that contains the SQL commands necessary to build that database's table structure and a checksum file (using a UNIX "sum" utility) so you can verify that the data has downloaded correctly.

Multi-species data

In addition to the individual species data, there is a directory containing multi-species databases - i.e. databases which either affect the site as a whole (e.g. ensembl_web_user_db), or which describe metainformation about multiple species (e.g. ensembl_compara).

ftp.ensembl.org/pub/current_multispecies/data/mysql
    |-- ensembl_compara_[[SPECIESDEFS::ENSEMBL_VERSION]] Comparative genomics database
|-- ensembl_website_[[SPECIESDEFS::ENSEMBL_VERSION]] Help database
|-- ensembl_web_user_db_[[SPECIESDEFS::ENSEMBL_VERSION]] Web preference database
|-- etc...

Regardless of which species you choose to install, for a full installation you will probably want to install the multi-species databases as well. i.e. compara, help and web_user_db.

NB: The FTP site will ideally be laid out as described. If, however, for reasons of space or maintainability, files are not located as described then check the ftp site for README files which should explain where the data can be found.

BioMart data

The remaining multi-species databases can be downloaded from:

ftp.ensembl.org/pub/current_mart/data/mysql
    |-- ensembl_mart_[[SPECIESDEFS::ENSEMBL_VERSION]]
|-- sequence_mart_[[SPECIESDEFS::ENSEMBL_VERSION]]
|-- snp_mart_[[SPECIESDEFS::ENSEMBL_VERSION]]
|-- vega_mart_[[SPECIESDEFS::ENSEMBL_VERSION]]

To install the Ensembl Data:

  1. Download the directories in ftp.ensembl.org/pub/current_organism/data/mysql for whatever organism you want to install. Note that the ensembl directory contains several files for the DNA and feature tables - these are very large tables, so the dump file is split into smaller chunks for easier downloading.

  2. Each table file is gzipped so unpack the data into working directories, keeping separate directories for each database.

    For each database you have downloaded, cd into the database directory and perform steps 3-5. For illustration, we will use homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] as the database - you need to change this appropriately for each database you install. Remember, you also need to download and install the multi-species databases.

  3. Start a MySQL console session (see the Installing MySQL section above if necessary) and issue the command:

    create database homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]]
  4. Exit the console session, and issue the following command to run the ensembl SQL file, which should be in the directory where you unpacked the downloaded data. This creates the schema for the empty database you created in step 3.

    Note that we are using the example MySQL settings of /data/mysql as the install directory, and mysqldba as the database user. Note that here mysqldba is a MySQL account with file access to the database, which is not the same as a system user. See the MySQL documentation for instructions on creating/administering users.

    /data/mysql/bin/mysql -u mysqldba homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] < homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]].sql
  5. Load the data into the database structure you have just built with the following command.

    /data/mysql/bin/mysqlimport -u mysqldba homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] -L *.txt.table

You have now created and loaded the core Ensembl database for human.

Note that all the databases except the ensembl_web_user_db database only require read access for the website to work. The ensembl_web_user_db requires a MySQL user with delete/insert/update permissions. Also note that because its the only database that the website writes data into, the ensembl_web_user_db has no .table (data) files to download.

NB MySQL needs quite a lot of temporary space to load the databases. It is quite possible that your / tmp directory (which MySQL uses by default) is too small, in which case you might see an Error 28 (use the MySQL tool perror to see what these error numbers mean). Fortunately, you can force MySQL to write temporary files to another location. See the MySQL docs for details: http://www.mysql.com/doc/T/e/Temporary_files.html. The simplest solution is to start mysqld with the argument --tmpdir my_spacious_tmp_location.

GO data

The Ensembl ftp site now includes a copy of the GO database as ensembl_go_[[SPECIESDEFS::ENSEMBL_VERSION]]. Install this if you want local GO information.

Installing the Ensembl, BioPerl, and BioMart modules

If you review the Site Structure part of this document, you will recall that the site is based around a single server-root directory. The Ensembl, BioPerl and BioMart modules are all installed into this directory. Choose a suitable location, and create your server-root directory. For the purposes of illustration, we will use /usr/local/ensembl. When following these instructions, replace / usr/local/ensembl with your chosen server-root.

  1. Go to the server-root directory:

    cd /usr/local/ensembl

To install the Ensembl modules

  1. Log into the Sanger CVS server (using a password of "CVSUSER"):

    $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/CVSmaster  login
    Logging in to :pserver:cvsuser@cvs.sanger.ac.uk:2401/cvsroot/CVSmaster
    CVS password: CVSUSER 
    
  2. To check out the most recent stable version of the ensembl API and web code from CVS (i.e. to download the code from the Sanger CVS server onto your local machine) you need to use the latest branch of the code. Please note the code on the CVS HEAD is under development and unstable. Use the following command making sure you use the code that matches your databases:

    $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/CVSmaster co -r branch-ensembl-[[SPECIESDEFS::ENSEMBL_VERSION]] ensembl-api ensembl-website 
    

    A listing of your server-root should now look something like:

    conf/
    ensembl/
    ensembl-compara/
    ensembl-draw/
    ensembl-external/
    ensembl-variation/
    htdocs/
    modules/
    perl/
    public-plugins/
    utils/

To install the BioMart modules (BioMart 0.5)

  1. Check the BioMart code out of CVS (downloads the code from the Sanger CVS server onto your local machine):

    $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/CVSmaster co -r release-0_5 biomart-perl

To install the BioPerl modules (BioPerl 1.2.3)

  1. Log into the BioPerl CVS server (using a password of: cvs):

    cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login
  2. Check-out the BioPerl code:

    cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl \
      co -r bioperl-release-1-2-3 bioperl-live

    A listing of your server-root should now look something like:

    bioperl-live/
    biomart-plib/
    biomart-web/
    conf/
    ensembl/
    ensembl-compara/
    ensembl-draw/
    ensembl-external/
    ensembl-variation/
    htdocs/
    modules/
    perl/
    public-plugins/
    utils/

You should now have all the Ensembl website code and data installed and ready to configure.