############################ #Strongyloides ratti genome# ############################ Version 2.0 This is a Wellcome Trust funded project to sequence and analyse the nuclear genome of Strongyloides ratti ED321 with the goal of producing a reference quality genome sequence. Three different sequencing technologies were used for this project: - Sanger (capillary) sequencing ~3x coverage - 454 (unpaired and paired 3kb and 8kb libraries) ~18x coverage - Illumina reads (~70x coverage) FTP site ftp://ftp.sanger.ac.uk/pub/pathogens/Strongyloides/ratti/ README.- This text file. ARCHIVE.- Directory with assemblies from the past. They are not informative at all. version_2.0.- Folder containing the v.2.0 of S. ratti assembly contigs and scaffolds Sratti_v2_contigs.fasta .- Assembly in contigs Sratti_v2_scaffolds.fasta .- Assembly in scaffolds Sratti_v2_scaffolds.agp .- AGP file with the position of each contig in the scaffolds ratti_bin.fasta .- Files with leftover contigs that were removed during the filtering proces. Sratti_v2_genemodels_beta.fna .- Nucleotide sequences of the coding regions Sratti_v2_genemodels_beta.faa .- Amino acid sequences of the coding regions Sratti_v2_genemodels_beta.gff .- GFF file of the gene models in scaffolds (Sratti_v2_scaffold.fasta) Sratti_v2_genetic_markers.txt .- List with positions of genetic markers in the assembly Version 2.0 notes: - Manual improvement of version 1.0 that fixed lots of missaemblies due to repeats. - Nomenclature: Sratti_(scaffold i or contig ID)_Chr(Chr ID) Example: Sratti_scf00003_Chr00 Chromosome IDs: Chr00: Contig or scaffold with no chromosome information Chr01: Contig or scaffold in Chr1 Chr02: Contig or scaffold in Chr2 ChrX: Contig or scaffold in ChrX - Gene models were generated using Augustus v2.5.5 trained with a set of ~400 curated S.ratti gene models, ESTs and RNAseq. This is a beta version of gene models since futher manual inspection, curation and annotation is needed. Nomenclature of the gene models: Sr321_CC#######.t1 CC: Chromosome #s: Gene (internal) ID t1: transcript 1 (we haven't analyzed splice variants yet) Example: Sr321_0X0002300.t1 Gene transcripts 0002300 in Chromosome X - Contigs in scaffolds have now a recalculated distance based on the information obtained from the read paired information of capillary and 454 sequencing. However, gaps that presented a negative value after recalculation were attached with a fake gap size of 20 bases since they don't overlap. - Scaffolds were separated in contigs by mapping the genetic markers from: Nemetschke et al - 2010. A genetic map of the animal-parasitic nematode Strongyloides ratti A map with the positions of the genetic markers is provided in the file Sratti_v2_genetic_markers.txt. The positions are reported in centimorgans according to the publication and in bases according to the distance in the scaffolds. Some of them map twice (possible 2 copies of the gene) or disagree with the order. The tab-delimited format of the file is: marker ID, relative distance (centiMorgans), Chromosome, scaffold ID, scaffold length, start, end - Sequences were selected by GC content (<= 35% GC) and by mapping of genetic markers and ESTs. Also, they were rescreened for bacterial contamination. - Genome has one iteration of ICORN for error correction using 7 lanes of Illumina. - ratti_bin.fasta contains contigs that didn't have a hit against bacterial genomes but might present some regions with similarity to bacterial secuences. They could be chimeric sequences that contain both, nematode and bacterial sequences. However, they could be horizontal transfer cases.