*** Overview ***

The VCF files contain short indel (<50 nt) calls for the low-coverage samples for
CEU, YRI, and JPT/CHB. All lines where the FILTER column says 'PASS'
should be considered high-confidence indel calls.

*** Procedure ***

The calls were made using the following procedure:

1. Extract candidate indels from Illumina, 454 and SOLiD data.
The full set of candidates is available at 
http://www.well.ox.ac.uk/~gerton/1000G/LC/pilot1-indelcalls-17sept09.tgz
The total number of candidates considered on all populations
was 8,504,899. The candidate indel set was compiled by Gerton Lunter from
candidates provided Sanger, Broad, Oxford, Sanger/LUMC and TGEN groups.
All candidates were tested in all populations. JPT and CHB were analysed jointly.

2. Realign reads around candidate indels to candidate haplotypes 
using the indel caller Dindel (Albers et al.). 
Dindel at this stage was used to produce both indel site
calls (make a call whether a candidate indel segregates in the population),
and to produce genotype likelihoods for each individual at a called site.

3. Finally, QCALL (Quang Si Le, Richard Durbin) was used to impute genotypes
from the genotype likelihoods for the sites called by Dindel, by making use
of LD structure. 
Note that QCALL filtered out a small fraction (<0.25%) of the sites 
called by Dindel; these are sites where the genotype likelihoods are
not consistent with the local LD structure. If a site is filtered out
in this way, the FILTER column in the sites VCF file will say 'NoQCALL'.

*** Novel indels ***

The indels were checked against dbSNP 129, the indels from 
(Mills et al., Genome Research 2006), and the indels from the Watson 
and Venter genomes. Due to inconsistencies in indel placement in various 
databases, the criterion for 'novel' is less precise than that for SNPs, 
and is given in the header of the VCF files.

*** Imputation notes ***

Even though candidates from all technologies were used, the support for
candidate indels was evaluated only on the Illumina sequence data. 
For the following samples there was no Illumina data, and as a result
their genotypes are completely imputed from other samples (any SOLiD/454
data for these samples was *not* used to compute genotype likelihoods).

List of samples without Illumina data imputed from other samples:
CEU: 	NA12814 NA11840 NA12872 NA12815 NA12812 NA12760 NA12874 NA12762 NA06985 NA12873 NA12234
YRI: 	NA19141 NA19143
JPTCHB:	NA18969 NA18970 


*** Note ***

The 'NoQCALL' subset of calls is likely to be enriched for false calls,
but they may contain potentially interesting targets for association studies,
as one reason for these sites being filtered by QCALL could be low LD
with nearby SNPs. The 'NoQCALL' indels are only present in the 'sites' file
and not in the 'genotypes' file.

*** Questions ***

If you have any questions, please email Kees Albers at caa (at) sanger.ac.uk

Kees Albers (caa (at) sanger.ac.uk)
Gerton Lunter 
Quang Si Le
Richard Durbin