The build instructions are split into three sections:
How to install other applications required by Ensembl
CVS, Perl, MySQL, Apache 2, Dotter
The first of these involves the installation of applications/modules that are not part of the Ensembl project but are necessary for the web site to work, such as Apache and MySQL. The second involves installing the Ensembl data, modules and web site code.
PLEASE NOTE - even if you do not wish to download all the Ensembl data - for the website to function you will need:
If you do not wish to run MySQL (or cannot run a MySQL server locally) AND you do not wish to enable user accounts then you can alternatively install SQLite as an alternative server.
Each Ensembl release has an integer version number associated with it, and that version number is used to identify the correct versions of API, Web code and databases (see below) that make up that release.
For the API and Web code, a CVS branch (essentially a named snapshot of the code) is made for each release, named with the release version number. The current release is version [[SPECIESDEFS::ENSEMBL_VERSION]], and the CVS tag for identifying the API and Web code for this release is 'branch-ensembl-[[SPECIESDEFS::ENSEMBL_VERSION]]'.
The Ensembl database names consist of the species, the database type, the release number, and the data version. The current human 'core' database is named homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]]. i.e. a human core database, release [[SPECIESDEFS::ENSEMBL_VERSION]], data version [[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] built on the NCBI 36 assembly. Subsequent data releases on the same assembly are suffixed with a lower case letter (a, b, etc.).
The idea is that components with the same release version should work together - i.e. any web site built with 'version 31' API and web code, and 'version 31' databases should work correctly.
Ensembl is built on the following applications:
These applications are not version-specific for Ensembl; that is, if you upgrade your Ensembl installation to a newer version when one becomes available, you probably won't need to install new versions of these applications.
All of this software, like all of Ensembl, is Open Source software and can be downloaded and used free of charge. You should, however, check the documentation for each application to see what license it has been released under, particularly if you are installing Ensembl in a commercial environment.
Detailed installation instructions are available for each of these packages
This section explains how to install the Ensembl data, Perl modules, and web code. It also covers the installation of BioPerl.
The Ensembl data is provided on the Ensembl FTP site in the form of tab-delimited text files for importing into MySQL. ftp://ftp.ensembl.org/pub contains a directory for each release for each species. The latest versions are named current_species, e.g. current_human, current_mouse, etc. The directory structure below this is as follows (using current_human as an example):
ftp.ensembl.org/pub/current_human | |||
|-- data | |||
|-- fasta | cDNA, DNA (masked and unmasked chromosome sequence dumps), RNA and peptide dumps | ||
|-- flatfiles | EMBL and GenBank format dumps | ||
|-- mysql | Database dumps |
The mysql directory contains a directory for each database. This can be used to install your own copy of the Ensembl data, e.g.:
ftp.ensembl.org/pub/current_human/data/mysql | ||
|-- homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] | core ensembl database | |
|-- homo_sapiens_otherfeatures_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] | other features database | |
|-- etc... |
Each database directory contains a data file for each table in that database an SQL file that contains the SQL commands necessary to build that database's table structure and a checksum file (using a UNIX "sum" utility) so you can verify that the data has downloaded correctly.
In addition to the individual species data, there is a directory containing multi-species databases - i.e. databases which either affect the site as a whole (e.g. ensembl_web_user_db), or which describe metainformation about multiple species (e.g. ensembl_compara).
ftp.ensembl.org/pub/current_multispecies/data/mysql | ||
|-- ensembl_compara_[[SPECIESDEFS::ENSEMBL_VERSION]] | Comparative genomics database | |
|-- ensembl_website_[[SPECIESDEFS::ENSEMBL_VERSION]] | Help database | |
|-- ensembl_web_user_db_[[SPECIESDEFS::ENSEMBL_VERSION]] | Web preference database | |
|-- etc... |
Regardless of which species you choose to install, for a full installation you will probably want to install the multi-species databases as well. i.e. compara, help and web_user_db.
NB: The FTP site will ideally be laid out as described. If, however, for reasons of space or maintainability, files are not located as described then check the ftp site for README files which should explain where the data can be found.
The remaining multi-species databases can be downloaded from:
ftp.ensembl.org/pub/current_mart/data/mysql | ||
|-- ensembl_mart_[[SPECIESDEFS::ENSEMBL_VERSION]] | ||
|-- sequence_mart_[[SPECIESDEFS::ENSEMBL_VERSION]] | ||
|-- snp_mart_[[SPECIESDEFS::ENSEMBL_VERSION]] | ||
|-- vega_mart_[[SPECIESDEFS::ENSEMBL_VERSION]] |
Download the directories in ftp.ensembl.org/pub/current_organism/data/mysql for whatever organism you want to install. Note that the ensembl directory contains several files for the DNA and feature tables - these are very large tables, so the dump file is split into smaller chunks for easier downloading.
Each table file is gzipped so unpack the data into working directories, keeping separate directories for each database.
For each database you have downloaded, cd into the database directory and perform steps 3-5. For illustration, we will use homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] as the database - you need to change this appropriately for each database you install. Remember, you also need to download and install the multi-species databases.
Start a MySQL console session (see the Installing MySQL section above if necessary) and issue the command:
create database homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]]Exit the console session, and issue the following command to run the ensembl SQL file, which should be in the directory where you unpacked the downloaded data. This creates the schema for the empty database you created in step 3.
Note that we are using the example MySQL settings of /data/mysql as the install directory, and mysqldba as the database user. Note that here mysqldba is a MySQL account with file access to the database, which is not the same as a system user. See the MySQL documentation for instructions on creating/administering users.
/data/mysql/bin/mysql -u mysqldba homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] < homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]].sqlLoad the data into the database structure you have just built with the following command.
/data/mysql/bin/mysqlimport -u mysqldba homo_sapiens_core_[[SPECIESDEFS::ENSEMBL_VERSION]]_[[SPECIESINFO::Homo_sapiens:SPECIES_RELEASE_VERSION]] -L *.txt.tableYou have now created and loaded the core Ensembl database for human.
Note that all the databases except the ensembl_web_user_db database only require read access for the website to work. The ensembl_web_user_db requires a MySQL user with delete/insert/update permissions. Also note that because its the only database that the website writes data into, the ensembl_web_user_db has no .table (data) files to download.
NB MySQL needs quite a lot of temporary space to load the databases. It is quite possible that your / tmp directory (which MySQL uses by default) is too small, in which case you might see an Error 28 (use the MySQL tool perror to see what these error numbers mean). Fortunately, you can force MySQL to write temporary files to another location. See the MySQL docs for details: http://www.mysql.com/doc/T/e/Temporary_files.html. The simplest solution is to start mysqld with the argument --tmpdir my_spacious_tmp_location.
The Ensembl ftp site now includes a copy of the GO database as ensembl_go_[[SPECIESDEFS::ENSEMBL_VERSION]]. Install this if you want local GO information.
If you review the Site Structure part of this document, you will recall that the site is based around a single server-root directory. The Ensembl, BioPerl and BioMart modules are all installed into this directory. Choose a suitable location, and create your server-root directory. For the purposes of illustration, we will use /usr/local/ensembl. When following these instructions, replace / usr/local/ensembl with your chosen server-root.
Go to the server-root directory:
cd /usr/local/ensemblLog into the Sanger CVS server (using a password of "CVSUSER"):
$ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/CVSmaster login Logging in to :pserver:cvsuser@cvs.sanger.ac.uk:2401/cvsroot/CVSmaster CVS password: CVSUSER
To check out the most recent stable version of the ensembl API and web code from CVS (i.e. to download the code from the Sanger CVS server onto your local machine) you need to use the latest branch of the code. Please note the code on the CVS HEAD is under development and unstable. Use the following command making sure you use the code that matches your databases:
$ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/CVSmaster co -r branch-ensembl-[[SPECIESDEFS::ENSEMBL_VERSION]] ensembl-api ensembl-website
A listing of your server-root should now look something like:
conf/ ensembl/ ensembl-compara/ |
ensembl-draw/ ensembl-external/ ensembl-variation/ |
htdocs/ modules/ perl/ |
public-plugins/ utils/ |
Check the BioMart code out of CVS (downloads the code from the Sanger CVS server onto your local machine):
$ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/biomart co -r release-0_6 biomart-perl
More information on tweaking and configuring BioMart to work within the Ensembl website is available
Log into the BioPerl CVS server (using a password of: cvs):
cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl loginCheck-out the BioPerl code:
cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl \A listing of your server-root should now look something like:
bioperl-live/ biomart-plib/ biomart-web/ conf/ |
ensembl/ ensembl-compara/ ensembl-draw/ ensembl-external/ |
ensembl-variation/ htdocs/ modules/ perl/ |
public-plugins/ utils/ |
You should now have all the Ensembl website code and data installed and ready to configure.