Linux Intro and NGS File Formats

NGS WTAC 2015

David Jackson

WTSI

Shell, Running Commands

N.b. easy and quite universal remote access

Navigation and Manipulation of Directories

External Drives

Fasta Sequence Data Files

and Text File Manipulation

Fasta Sequence Data Files

and Text File Manipulation

Fastq Sequence Data Files

and Using the Shell

Pipes and Data Redirection

Process Control

SAM (& BAM) Sequence & Alignment Files

Samtools

obtaining and building

Samtools

using

Samtools & CRAM

Illumina Runfolder Anatomy

So...

  • Sequence file formats
    • fasta, fastq, sam/bam : you're most likely to work with today
    • cram : smaller than bam, v2.1 now, next likely common format
    • sff, srf, sra, ztr : there are plenty of other formats which typically contain more "raw" data
    • Illumina: (c)locs, filter, control, bcl → fastq
  • Using the shell
    • is a "universal" language
    • gives you a
      history
    • can get the computer to do the (boring) repetative stuff completely consistently
    • allows you wrap an established procedure in to a script....
Questions?

Samtools & flagstat, stats/bamcheck

  • samtools-1.2/samtools view -u \
    ftp://ngs.sanger.ac.uk/scratch/project/WTAC/processed_data/12585_1#21.bam \
    | samtools-1.2/stats - | tee 12585_1#21.bam.bamstats
    pull your data from ftp site, push it uncompressed through bamcheck, and both write bamcheck's output to the terminal and to a file
  • samtools/misc/plot-bamstats -p 12585_1#21/ 12585_1#21.bam.bamstats
    create some plots using the bamcheck data

Picard

  • Download and uncompress/install Picard
  • Try converting from BAM to fastq

FastQC

  • From Babraham Institute - neighbours up the road...
  • Download and uncompress/install FastQC
  • Try inspecting BAM or fastq files

samtools & pileup

  • samtools-1.2/samtools faidx phix-illumina.fa
    create an index for reference fasta file
  • cat phix-illumina.fa.fai 
    it's quite boring for this reference
  • samtools-1.2/samtools mpileup -f phix-illumina.fa s6823_1_phix.bam | less -S

IGV

  • Inspect your BAM file with IGV

Biobambam

  • Download and uncompress/install Biobambam
  • Try marking duplicates using Biobambam and compare with Picard