VAGrENT GettingStarted.txt INTRODUCTION VAGrENT is a tool set for Genomic Variation Annotation. It compares genomic variants against a reference genome annotation and attempts to identify the biological consequences those variants may cause. THE BASICS VAGrENT is a collection of object orientated perl modules. Its designed to be flexible to use and easy to modify, but also to provide core functionality without too much effort. USAGE There is an example script included, perl/scripts/SubstitutionExampleScript.pl VAGrENT doesn't contain any standalone analysis scripts yet. Variation file formats are yet to stabilise and so there are no standard input/output file formats available. The VAGrENT perl modules have to be called directly from other perl scripts, the example script provides simple demonstration. The modules can be divided into 3 broad types discussed below (Input, Utility and Output). INPUT DATA CLASSES These classes model the basic genomic changes and handed to the utility classes to analyse. Sanger::CGP::Vagrent::Data::Substitution - There are examples of using Substitution in the SubstitutionExampleScript.pl (see the getExampleSubstitutions method) UTILITY CLASSES These are the workhorse classes that actually generate the annotation for the provided variation. They are split into 2 types, Transcript sources and Annotators Transcript sources provide lists of transcripts that overlap a specified genomic position. These are used by the Annotators to find transcripts that are potentially affected by a variant. There is an Ensembl based transcript source included (Ensembl API required) but any subclass of Sanger::CGP::Vagrent::TranscriptSource::AbstractTranscriptSource will work. Annotators compare a variant to each transcript supplied by the transcript source and attempts to describe its consequences. These do the actual sequence and coordinate comparisons and try to calculate the consequences for the transcript. Different classes of variant (Sub/Ins/Del etc) require handling slightly differently, and each produces only certain subsets of consequences, so there are separate Annotators for each class. Sanger::CGP::Vagrent::Annotators::SimpleSubstitutionAnnotator - for single base subs For convenience there is also an Annotator that holds a list of other Annotators. It has the same behaviour as the class specific Annotators, and sends variants through each Annotator in turn and collates the answers. Sanger::CGP::Vagrent::Annotators::AnnotatorCollection OUTPUT DATA CLASSES The consequences of a variation within a transcript can be described in 3 different contexts (mRNA, CDS and Protein). This is handled by having an AnnotationGroup for each transcript/variation combination, each containing a number of Annotation records corresponding to individual contexts. Sanger::CGP::Vagrent::Data::AnnotationGroup - Contains basic transcript data and Annotation objects Sanger::CGP::Vagrent::Data::Annotation - Contains detailed info about the effects of a variant within a specific context This allows variants in non-coding transcripts to be described at the mRNA level only, but protein-coding transcripts can have their full mRNA, CDS and Protein descriptions.