VAGrENT GettingStarted.txt

INTRODUCTION

VAGrENT is a tool set for Genomic Variation Annotation. It compares genomic variants 
against a reference genome annotation and attempts to identify the biological 
consequences those variants may cause.


THE BASICS

VAGrENT is a collection of object orientated perl modules.  Its designed to be flexible 
to use and easy to modify, but also to provide core functionality without too much effort.


USAGE

There is an example script included,

perl/scripts/SubstitutionExampleScript.pl

VAGrENT doesn't contain any standalone analysis scripts yet. Variation file formats are yet 
to stabilise and so there are no standard input/output file formats available.

The VAGrENT perl modules have to be called directly from other perl scripts, the example script
provides simple demonstration.  The modules can be divided into 3 broad types discussed below
(Input, Utility and Output).


INPUT DATA CLASSES

These classes model the basic genomic changes and handed to the utility classes to analyse.  

Sanger::CGP::Vagrent::Data::Substitution - There are examples of using Substitution in the 
                      SubstitutionExampleScript.pl (see the getExampleSubstitutions method)


UTILITY CLASSES

These are the workhorse classes that actually generate the annotation for the provided 
variation. They are split into 2 types, Transcript sources and Annotators

Transcript sources provide lists of transcripts that overlap a specified genomic position. 
These are used by the Annotators to find transcripts that are potentially affected by a variant.  
There is an Ensembl based transcript source included (Ensembl API required) but any subclass of 
Sanger::CGP::Vagrent::TranscriptSource::AbstractTranscriptSource will work.

Annotators compare a variant to each transcript supplied by the transcript source and attempts to
describe its consequences.  These do the actual sequence and coordinate comparisons and try to 
calculate the consequences for the transcript.  Different classes of variant (Sub/Ins/Del etc) 
require handling slightly differently, and each produces only certain subsets of consequences, so 
there are separate Annotators for each class.

Sanger::CGP::Vagrent::Annotators::SimpleSubstitutionAnnotator - for single base subs

For convenience there is also an Annotator that holds a list of other Annotators. It has the same 
behaviour as the class specific Annotators, and sends variants through each Annotator in turn and 
collates the answers.

Sanger::CGP::Vagrent::Annotators::AnnotatorCollection


OUTPUT DATA CLASSES

The consequences of a variation within a transcript can be described in 3 different contexts (mRNA, 
CDS and Protein). This is handled by having an AnnotationGroup for each transcript/variation 
combination, each containing a number of Annotation records corresponding to individual contexts.

Sanger::CGP::Vagrent::Data::AnnotationGroup - Contains basic transcript data and Annotation objects
Sanger::CGP::Vagrent::Data::Annotation - Contains detailed info about the effects of a variant within
                                         a specific context
																				
This allows variants in non-coding transcripts to be described at the mRNA level only, but 
protein-coding transcripts can have their full mRNA, CDS and Protein descriptions.