READ ME file Monday 15th May 2006 ******************************************************************************** CDF files: - pfsangerb520296.cdf contains all the probes present on the PFSANGER array - pfsanger.Thu_May_4_2006.allCdsProbes.cdf is a custom cdf file and contains all the unique probes mapping a given gene (exons + introns + probes overlapping annotated boundaries) - pfsanger.Thu_May_4_2006.exonsets.cdf is a custom cdf file and contains all the unique probes mapping exon(s) only for a given gene - pfsanger.Thu_May_4_2006.intronsets.cdf is a custom cdf file and contains all the unique probes mapping intron(s) only for a given gene The custom CDF files have been generated using the exonerate software that perfect matches the 25bp-probes on the Plasmodium falciparum genome version 2.1 (http://www.genedb.org/genedb/malaria/). During the process, all individual probes are annotated according to the latest available annotations and non-unique probes were removed from the set. Probes mapping to exons/introns/gene were subsequently grouped into a probe set. I suggest you open-up one of those CDF custom file using a text editor (but not on windows) to see the structure and probes organisation. To see how many probes are within a probe set, you need to open-up a terminal window and do: $ grep -c 'probeset' pfsanger.Thu_May_4_2006.xxxxxx.cdf (where probeset = MALxxP1.xxx / PFxxxxxc/w / PFxx_xxxx) Example: $ grep -c 'MAL13P1.113' pfsanger.Thu_May_4_2006.exonsets.cdf $35 This command line will return the number of times the 'probeset' is found in the CDF file. Because of the structure of this CDF file, this probe number will be -1 the number returned. In the above example, there are 34 probes constituting the exon(s) probe set of MAL13P1.113. Using R/Bioconductor version 2.3 you need to generate a custom cdf package: #from the directory where your downloaded CDF will be, open up an R session R> library(makecdfenv) #load the package 'makecdfenv' R> make.cdf.package("pfsanger.ThuMay42006.xxxxxx.cdf", species="Plasmodium_falciparum") #this will generate a subdirectory pfsanger.ThuMay42006.xxxxxxcdf in your working directory, which contains the package. Please refer to the help page for more information. Now open a different terminal and install the cdf package using: $ R CMD INSTALL pfsanger.ThuMay42006.xxxxxxcdf #that will go to the default library of Bioconductor When starting an analysis please do: R> library(affy) R> my.data <- ReadAffy() R> my.data@cdfName <- "pfsanger.ThuMay42006.xxxxxxcdf" Then proceed to further pre-processing and/or analysis as required. The pfsangerb520296.cdf needs to be installed in R the same way. However you may have some difficulties installing it as it requires a lot of memory (a Linux 64bit machine is necessary). As an alternative, you can use RMAExpress (http://rmaexpress.bmbolstad.com/) to pre-process and calculate the signal intensities, then either transfer back into R/Bioconductor or work on the table of matrices returned. For further enquiries please refer to: Celine Carret: ckc@sanger.ac.uk Matloob Qureshi: mq2@sanger.ac.uk Al Ivens: alicat@sanger.ac.uk