REFERENCE AUTHORS De Chiara M, Hood D, Muzzi A, Pickard D, Perkins TT, Pizza M, Dougan G, Rappuoli R , Moxon R, Soriani M and Donati C TITLE Whole genome sequencing of disease and carriage isolates of Non-Typeable Haemophilus influenza identifies discrete population structure JOURNAL Submitted 2014 GENOME ASSEMBLY AND ANNOTATION. The assemblies and annotations were performed in Novartis Vaccines and Diagnostics using the Novartis pipeline.The genome sequences were assembled using Celera Assembler 7 (1). Each genome was assembled seven times using a different number of reads, from 400,000 reads up to a maximum of 1,000,000 total reads and the assembly with the lowest number of separate contigs and the highest total number of assembled bases was chosen. The resulting coverage ranged from 30x to 80x, with an average of 50x. The draft genomes were annotated using a hybrid approach. First, the annotation of the complete and draft genomes downloaded from the web was transferred onto the newly sequenced genomes using RATT (2). To identify ORFs in regions that had no close homology to already annotated strains, we performed a de novo ORFs prediction using Glimmer3 (3). Overlapping predictions from the two methods were merged if the ORFs shared the same stop codon, and the starting site of the longer ORF was retained. 1. Myers EW, et al. (2000) A whole-genome assembly of Drosophila. Science 287(5461):2196-2204. 2. Otto TD, Dillon GP, Degrave WS, & Berriman M (2011) (RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res 39(9):e57. 3. Delcher AL, Bratke KA, Powers EC, & Salzberg SL (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6):673-679.