Latest release V4.7 03 Nov 00
This guide has been updated. the manual has not but updated since V4; until its updated, you need to use the manual along with the changes stated under the section release notes.
This is to be used in conjunction with the User's Manual.
Contents:
When you start using FPC for a project, you must
determine the best tolerance and cutoff
for your data.
TOLERANCE: You can have fixed or variable tolerance.
I have seen the following cases:
If using Image with agarose gels, it produces files with the
following suffixes: *.bands, *.sizes, *.gels, where * is the name of the gel.
For a new project,
create a directory using the name of the project, then create sub-directories
called /Image, /Bands and /Gels. Move the *.bands files to /Image, the
*.size files to /Sizes and *.gel files to /Gel. The will use migration rates for
the agarose gels, but you can use the size files. See the release notes for
more information.
CUTOFF: You can use Equation 1 or Equation 2 (see CABIOS Vol 13 No. 5).
Both take into account the tolerance, number of matching bands, and
number of bands in each clone in order to calculate the probability that
the matching bands are a coincidence. The lower the number (i.e. the
higher the exponent), the more stringent the probability for calling
two clone overlapping.
A frequent source of confusion is the fact that the
tolerance is used in the equation. For example,
using fixed tolerance and equation 1, making the tolerance more stringent
can cause the coincident score to become more stringent (Case 1) or
LESS stringent (Case 2).
It is important to determine the best tolerance and then not vary it.
Another source of confusion is that the number of bands are used in
the equation, so you can get the following:
1. View a set of clones using the fingerprint window and try different
tolerances to see the effect. The tolerance can be changed on the window,
and picking a clone will highlight all bands in other clones that
are within the tolerance of a band in the picked clone.
(Thanks to Sam Cartinhour for this suggestion.)
2. On Main Analysis window, beside "--> FPC" enter a clone name
and select "--> FPC". You will get the following type output:
3. On Main Analysis, run Build Contig trying different cutoffs (see C).
The CB algorithm is a fast approximation.
It gives a good approximation of the order of the clones
regardless of the quality of data, that is, any two clones that have a 'good' overlap based
on the cutoff will (with the rare exception) overlap in the resulting map.
Obviously, if you give a cutoff that results in many false positives,
your map will be incorrect; the simulation and following
discussion suggest how you can prevent this from happening.
When the CB algorithm selects a clone to incorporate into the map, it determines
the interval in which it belongs, i.e. within the extents of the other clones to which
it is presumed to overlap based on the cutoff. If there is no good alignment within
the interval, it is called an 'Q' clone. Too many Q's can result from
(1) many clones incorrectly mapping to the same space due to similar
fingerprints, (2) clones with poor fingerprints,
or (3) a sub-optimal solution.
The CB algorithm greedily adds clones to the map based on
how well they overlap the current clones in the map which can lead to a
sub-optimal (i.e. incorrect) solution.
In order to reduce the chance of a sub-optimal solution from occurring,
the CB algorithm generates N solutions (N is 10 by default) and keeps the
solution with the best score. The score is:
Score = ((P+left+right) - N)/(total-ends) where
The output from the CBmap shows two scores, the best and worse.
The Build Contigs function uses the CB routine which clusters clones based on good
overlap scores. It also burys clones (see manual)
and gives an approximate ordering.
On Main Analysis, run Build Contig.
You will get the following style output:
If you do not like these contigs, e.g. too many contigs or too many Qs,
adjust the tolerance and/or cutoff, and
execute Build Contig which destroys the contigs and rebuilds them with the
new parameters
(in the text box beside the Kill button should have the word 'max').
Continue to do this until you get the size contigs you expect.
For this example, only one contig has a high number of Qs, so I would use
these parameters and fix ctg1 as described in 'Fixing poor contigs'. By fixing
this one contig, I end up with 6 perfect contigs, versus if I decrease the
cutoff to 1e-07, I get 12 contigs and still have 45 Qs.
In old versions of FPC, there was no score and only one solution was
generated, hence, the user had to inspect each contig. Now it is only necessary
to look at poor contigs: if a contig has more than 1 Q and/or a low score,
there may be some problems with it. What is a 'low' score depends on the dataset.
See section D for suggestions on looking at poor contigs.
The output of the Build Contig function lists the average
number of overlaps, e.g.
Step 1. Run Calc from the Contig Analysis window.
If you use the same cutoff that was used to build the contigs, Calc will generate
the same solution that was used to create the contig. When there are many Q clones,
there are probably at least one false positive so lower the cutoff
(e.g. if the contigs are build with 1e-08, use 1e-09 or
1e-10). Run Calc and it should create one or more CBmaps of well ordered clones and
ignore any clones which do not have a good overlap with any other clones based on the cutoff.
Buried
Extra bands On the CB display, the number of bands not placed on
the map for each clone is given beneath the clone name.
How many extra bands are too many?
With the BAC agarose fingerprints, the number of extras ranges between 0 and 8 with
an occasionally 10 to 15 (this includes the end fragments). With BAC polyacrylimide fingerprints,
they range between 10 to 15. There will generally be a few more extra's and false negatives
than expected due to the inherent uncertainty in the data which causes many situations that can
not be resolved by a fast approximation. But as mentioned earlier, it does give a good
approximate overlap when the clones are instantiated (assuming your tolerance and cutoff are
appropriate).
Step 2. Execute Ok which brings up a menu. Execute OkAll.
A greedy algorithm is used to order the maps
based on scores. Some details:
(1) If the buried clones are not used in the CBmap, they will still be used
for potentially joining contigs. (2) If a parent or sequenced clone is 'Ignored, it cannot be
buried, so it will be put at the end of the current contig.
With Example 1, I ran the CB algorithm on Ctg1 using 1e-08, and got the following:
To test whether the joins seemed reasonable,
I did the following: I OKALL'ed with a Overlap between
adjacent CMmaps set to 1. Then I set the cutoff used
for the joins, i.e. 1e-06, and used the Step
function of NoOlap (see Step 3). It shows all clones
that could have overlapped.
The merges appeared correct and the three separate contigs could not be
joined so I set Overlap between adjacent CMmaps back to 10
and the Move unconnected to new contigs set to on, and got the output:
Step 3. Verification.
Running the CB algorithm for a contig and having it produce acceptable results
is in itself a verification.
If you have marker data, using the Split marker test is good verification, that is,
none of the marker should be split unless there is a false positive hit.
And, there are three tests for the contig:
Consistent FPC: A consistent FPC is one in which all contigs
are built with the same cutoff so that there is always the
transitively overlapping clone property described in section B,
except when there has been a Merge or Split (i.e. by doing an OKALL
which moves contigs based on a more stringent overlap to new contigs).
These changes can be automatically recorded by turning on the
Trace option on the Project menu.
To do this, it is important that you always use the same cutoff for builds.
We are currently experimenting with how rigorous we can make
consistent FPCs.
The added clones could merge contigs (a clone is added to the contig in which it has
the best score). On the Main Analysis, use Ends->Ends to determine
potential merges. Or, on the contigs that have had clones added, use Ctg->Ends.
Use the Merge to do the merges (see the manual).
When two or more clones have a sequence status other than NONE or CANCELLED,
the contig status is set to NoCB so that
the clones are not moved during a Incremental Build.
For example, Ctg1 had two clones added to it and one of the clones also overlaps
with one or more clones in Ctg2, then Ctg2 will be merged with Ctg1.
If neither contig is marked with NOCB, then the CB algorithm is run on it
to get a ordering of all the clones. If either of the contigs has a status
of NOCB, the clones are not reordered; they will be placed in the map as
they are added with a yellow bar between added clones and contigs. Its up
to the user to then determine the order; it may be easiest to run the CB
algorithm again and then to re-adjust the positions of the sequenced clones
in order to give them the correct overlap. We are remarking our sequenced
clones with the overlap as follows: a sequence clone has a remark about the
next sequenced clone to its right along with the distance, e.g.
the remark may be $bA110 10 1000$ for 10 bands and 1000 bases. All our new FPCs
are using the this syntax for the overlap. Hence, overlaps can easily be
reconstructed.
The data set has no ambiguities except the end fragments. There are 4278 clones.
The bands have migration rate values within the range of [766,3289].
All the following runs used a tolerance of 7, which gives 1890 unique bands out of 80115 total with
an average bin size of 516 (e.g. there are 538 clone bands in the data set within +/- 7 of 3228).
A. Setting the tolerance and cutoff
B. CBmaps
C. Building Contigs
D. Fixing poor contigs
E. Incremental update
F. Picking a minimal tiling path
G. Simulation results
H. Auxiliary Files
I. Release V4 Notes
A. Setting the tolerance and cutoff
By default, FPC uses fixed. Change to variable tolerance on the
main Configure window.
(i) Bands measured on the acrylimide gels using Image result in migration
rates, so a tolerance of around 7 is appropriate.
(ii) Bands measured on the acrylimide gels using the ABI software result in
bands sizes with a fixed tolerance, so a tolerance of 1 or 2 is appropriate
(i.e. in this case it refers to bases).
(iii) Sizes measured on agarose gels use a variable tolerance, so using a
tolerance of 3 will give a 0.003% uncertainty.
Case 1:
Number of bands
Tol Clone 1 Clone 2 Matching Bands Cutoff
7 22 15 11 3e-09
5 22 15 11 3e-11
Case 2:
Tol Clone 1 Clone 2 Matching Bands Cutoff
7 22 36 14 3e-12
5 22 36 14 3e-09
Number of bands
Clone 1 Clone 2 Matching Bands Cutoff
52 38 12 3e-02
52 16 12 3e-06
HINTS on how to determine tolerance and cutoff
>> bk7d10 ctg0 22b --> Fpc (Tol 5, Cutoff 1e-08)
Ctg0 bK123A11 15b 11 9e-11
Ctg0 bK256D12 36b 14 2e-09
Ctg0 cB58D5 12b 9 4e-09
Ctg0 cB73B9 15b 10 3e-09
Ctg0 dJ76B20 40b 18 5e-14
If you have a 5-fold coverage, this number of matches would be appropriate.
If you get too many or two few matches, vary the tolerance and/or cutoff.
B. CBmaps
To order clones,
the CB algorithm is run on a set of clones. When run from the Main Analysis window,
the set is all the singletons (i.e. clones not in a contig). When run from the Contig
Analysis, the set is all the clones in the contig. The CB algorithm creates one or
more CBmaps, where each CBmapi has the property that every clone
in CBmapi has an
good overlap with at least one other clone in the map, and no clone in any
other CBmap has a good overlap with a clone in CBmapi.
Clonei has a 'good overlap' with Clonej
if their coincidence score is less than or equal to the cutoff (e.g. 1e-06).
If a clone does not have a good overlap with
any other clone, it is called an 'Ignored' clone and does not belong to a CBmap.
Score and Q's
N is the number of negatives ('o'),
P is the number of positives ('+'),
Left and right are the maximum number of extra bands considering all
clones near to the left and right end, respectively,
Ends is a approximate number of end fragments,
Total is the total number of bands.
C. Build Contigs
Example 1.
Kill/Build/OKall:
Kill 0 contigs of size <= 0. Number of Contigs 0. Max contig 0.
Cutoff 1e-06
Clones to process: 400 (<= MinBands 0)
Begin 80000 comparisons with Equation 1:
Create ctg0 CBmap 1 Clones 226 CBs 570 Qs 69 Score 0.660 0.177
Create ctg1 CBmap 2 Clones 35 CBs 114 Qs 0 Score 0.994 0.977
Create ctg2 CBmap 3 Clones 98 CBs 361 Qs 0 Score 0.979 0.969
Create ctg3 CBmap 4 Clones 41 CBs 145 Qs 0 Score 0.995 0.983
Time User 0 min 10 sec, System 0 min 0 sec
Tol 7 Cut 1e-06 Bury~ 0.10 Best 10
Ignore 0 AvgOverlap 1.7 AvgScore 0.907 Qs 69
Create 4 contigs (from ctg1 to ctg4)
Contig sizes: Max 226, 4 (>25), 0 (25:10), 0 (9:3), 0 (=2), 0 Singles
Clones to process: 13940 (<= MinBands 271, Unbury 3574)
Begin 97161800 comparisons with Equation 1:
at clone #71, compared 1001052, overlaps 4410, average 61.250...
at clone #144, compared 2010715, overlaps 9282, average 64.014...
at clone #217, compared 3015049, overlaps 12908, average 59.211...
This change was made when determining why this dataset was running out of
memory on a Sun, the problem has been fixed. A dataset this size with a high
average overlap needs approximately 128 Mb main memory and 400 Mb swap.
D. Fixing poor contigs
Begin 25538 comparisons with Equation 1:
Calc ctg1 CBmap 1 Clones 19 CBs 83 Qs 0 Score 0.989 0.973
Calc ctg1 CBmap 2 Clones 19 CBs 81 Qs 0 Score 0.997 0.986
Calc ctg1 CBmap 3 Clones 12 CBs 48 Qs 0 Score 0.986 0.986
Calc ctg1 CBmap 4 Clones 43 CBs 166 Qs 0 Score 0.987 0.976
Calc ctg1 CBmap 5 Clones 30 CBs 113 Qs 0 Score 0.986 0.984
Calc ctg1 CBmap 6 Clones 17 CBs 65 Qs 0 Score 0.979 0.969
Calc ctg1 CBmap 7 Clones 12 CBs 45 Qs 0 Score 0.989 0.984
Calc ctg1 CBmap 8 Clones 17 CBs 65 Qs 0 Score 0.990 0.979
Calc ctg1 CBmap 9 Clones 19 CBs 74 Qs 0 Score 0.991 0.979
Calc ctg1 CBmap 10 Clones 6 CBs 21 Qs 0 Score 1.000 1.000
Calc ctg1 CBmap 11 Clones 5 CBs 20 Qs 0 Score 1.000 0.987
Calc ctg1 CBmap 12 Clones 5 CBs 22 Qs 0 Score 1.000 1.000
Calc ctg1 CBmap 13 Clones 4 CBs 10 Qs 0 Score 1.000 1.000
Calc ctg1 CBmap 14 Clones 5 CBs 19 Qs 0 Score 1.000 1.000
Calc ctg1 CBmap 15 Clones 5 CBs 17 Qs 0 Score 1.000 1.000
Calc ctg1 CBmap 16 Clones 2 CBs 10 Qs 0 Score 1.000 1.000
Time User 0 min 1 sec, System 0 min 0 sec
Tol 7 Cut 1e-08 Bury~ 0.10 Best 10 Hide Bur
Ignore 6 AvgOverlap 1.2 AvgScore 0.994 Qs 0
Executing OKALL with Cutoff for matching end cloens set to 1e-06
generated the following output:
Cutoff for Ends 1e-06. Use markers. Distance from ends 15.
OKALL: CBmaps 16 (Ends 151) + 6 Ignored clones. Matched ends 40.
CBmaps 16 Merge 13 Contig 3. Ignore 6: Bridge 3 Bury 3 End 0.
Cutoff for Ends 1e-06. Use markers. Distance from ends 15.
OKALL: CBmaps 16 (Ends 151) + 6 Ignored clones. Matched ends 40.
CBmaps 16 Merge 13 Contig 3. Ignore 6: Bridge 3 Bury 3 End 0.
Move 157 clones from ctg1 to ctg5
Move 19 clones from ctg1 to ctg6
If you do not agree with the merges, you can try changing the parameters and re-executing
OKALL. If you still do not agree, you will need to manually edit the contig by using
the Edit Contig menu, the manual fully describes these operations.
NB These tests only run on the displayed clones.
If you have your cutoff set appropriately and there are no false positives in your contig,
the second test should always show zero bad clones. The third test is much more stringest and
even contigs meticulously built by hand have bad clones with this test, but the output on
the terminal can help identify the worst cases.
E. Incremental update
There are three ways you can add clones to your contigs going from fully automated to
very little automation. I would think the first way described would generally be used,
but the other ways are there if needed.
Incremental Build Contigs
The newest FPC feature is the incremental build. Use Update .cor to get
the new clones written into the database, this execute Incremental Build Contigs.
This adds clones to contigs and automatically merges contigs.
For Example 1, I added 4 clones and ran the IBC, and got:
Incremental Build Contigs:
Cutoff 1e-06
No CBmap 0. Avoid 0 contigs. MinMax(0,32767).
New 4 clones to process (<= MinBands 0).
Merge 1 Add 4 New 0
Ctg3 Clones 141 0.981 0 2 Ctg4
Ctg5 Clones 159 0.984 0 2
Time User 0 min 16 sec, System 0 min 0 sec
Tol 7 Cut 1e-06 Bury~ 0.10 Best 10 Last 19/9/99 (New 4,0)
AvgOverlap 1.8 AvgScore 0.98 Qs 0 (0)
Update 2 Merge 1 Add 4 New 0 Modified 19/9/99 16:46
Contigs 5: Max 159, 4 (>25), 1 (25:10), 0 (9:3), 0 (=2), 0 Singles
The Project window will appear showing the contigs that have clones
added to them and the contigs merged.
To see the added clones, select a contig, select Analysis,
and on the analysis window,
select ShowIncBuild. and the added clones will be highlighted.
Keyset-->FPC with Auto Add
Alternatively, you can semi-manually add and merge, as follows:
A keyset of the new clones can be created by requesting
the clones added after a given date.
From the Main Analysis window,
compare the new clones against the rest of the clone in the
database by executing Keyset-->Fpc
with the Auto Add flag on so it will
automatically add a matched clone to a contig and each clone will
be positioned under the clone that it has the highest overlap with.
The Project window will appear showing the contigs that have clones added to them.
To see the added clones, select a contig, select Analysis, and on the analysis window,
select ShowAdditions. and the added clones will be highlighted.
Keyset-->FPC without Auto Add
Create a keyset window and execute Keyset-->FPC without setting
the Auto Add flag.
The project window will be displayed listing the number of hits
for each contig.
For each contig with hits, display the contig and then
select Compare Keyset from the Contig Analysis window.
An internal list is
created of clones that match the contig. Step thorough the list by
selecting Next, a clone is listed and all the clones it matches are
highlighted. If you want to add the clone, select Add and refine the
position, or Add&Bury.
F. Selecting a minimal tiling path
The CBmap algorithm does not get the overlap between clones precisely.
This is due largely to missing or additional bands in which case you
need to look at the gels. If you use Image, the Gel files are part of
the output.
Set the Buried toggle on a contig so that the buried clones are not shown,
select all the rest using the Edit Map window and
then have them loaded into the Gel or Fingerprint window.
If you change the order, use FpOrder or GelOrder followed by OK to instantiate the changes.
Any further refinement of the overlaps must be done using the
Edit Map editing functions. Generally, only the clones picked for
the minimal tiling path have there overlaps defined precisely. These clones
should be marked as tiling paths clones by using the Edit function
on the clone text window to set the Status to TILE.
G. Simulation results
The following data set was provided by LaDeana Hillier and Ken McDonald, GSC, St. Louis.
Using half of each of the 6 C.elegans chromosomes, the N's and gaps were removed from the sequence (hence, one
sequence for each half chromosome). Clones were generated to give a 5x coverage with an approximately 80%
overlap. A simulated double digest of HindIII and PstI was performed resulting in an average 18 bands
per clone.
The clones were not started and stopped at either of these sites, hence, they have end
fragments.
Cutoff | Contigs | F+ | F- | Mixed* | Out-of-order** |
1e-06 | 51 | 22 | 49 | 5 | 272(3) |
1e-07 | 167 | 2 | 129 | 2 | 27(0) |
1e-08 | 301 | 0 | 271 | 0 | 26(0) |
1e-08/1e-06 | 73 | - | - | 0 | 45(1) |
At a 1e-06 cutoff, clones with less than approximately 50% shared bands do not pass the overlap test. Chromosomes get split into multiple contigs where there are weak overlaps.
The contigs with clones from multiple chromosomes have areas of complete chaos where multiple contigs have mapped to the same space. The counts of out-of-order is atrocious in these areas. Where chaotic behavior is evident, the CB algorithm is run on each chaotic contig as described in the next paragraph. Though we were able to determine the chaotic contigs by the scoring program (see below), they can be determined visually, e.g. 100 clones should not be mapping to the same space.
The fourth entry of the table was created as follows: The database from 1e-06 was loaded. For the 5 contigs with chaotic areas, the Unbury All was executed and them the Calc algorithm was run on these contigs individually using a 1e-08. The Ok was executed to instantiate the new order, using the options: (1) merge CBmaps with end clones having 1e-06 overlap or better, (2) overlap adjacent maps by 2, and (3) move disconnected CBmaps to new contigs. The effect of this is to only allow the 1e-06 overlap on the ends which adds an additional constraint, and hence removed all false negative joins.
Scoring The clones were named such that they have the chromosome number and ordering information in it
(e.g. ele4_256). Ken McDonald wrote a program to determine where multiple chromosome are in a contig and
to determine if two clones are out of order, e.g. if ele4_256 starts after ele4_257. Note that this
just means that their starting points are off by a few bands but they still overlap.
Ken McDonald at GSC has created a size calculator
(http://www.cs.clemson.edu/~cari/fpc/sizecalc.html)
that can be
used with agarose gel data that has been processed by Image such
that the bands values are migration rates instead of sizes. In order
to use the sizes, use the calculator.
Ian Longden (besides developing some of the display FPC software) has provided the
option of filtering vector fragments. Information is given in the manual, but the
extended format is described in
http://www.cs.clemson.edu/~cari/fpc/vector.html.
S. Gregory, C. Soderlund and A. Coulson (1996) Contig assembly by fingerprinting
.
In P. H. Dear (ed) "Genome Mapping: a Practical Approach".
C. Soderlund, I. Longden and R. Mott (1997) FPC: a system for building contigs f
rom
restriction fingerprinted clones. CABIOS, 13: 523-535.
Soderlund, C., S. Gregory and I. Dunham (1998) Sequence ready clones. In Bishop
, M.J.
(ed) Guide to Human Genome Computing, Academic Press.
Soderlund, C., S. Humphrey, A. Dunhum, and L. French (2000). Contigs built with fingerprints, markers and FPC V4.7. Genome Research in press.
H. Auxiliary Files
I. Release V4 Notes
FPC 4.7 15 Oct 00
On the project window, the framework display: the 'Diff' field for
the positions of markers in a contig has been changed so that it is
the position. This is to make it easy to see if the positions are
similar to the framework order.
Sometimes selecting a marker caused it to jump to a different area
of the display. This 'may' be fixed.
FPC 4.7 8 Sept00
The Replace framework was buggy; it now needs the '.fw' suffix.
The Update markers was buggy; it now needs the '.ace' suffix.
The Replace markers had two bugs: (1) it would not display new markers
until you exited and restarted. (2) If a clone had all its markers
removed such that it was no longer in the .ace file, and if the markers
were still in other clones, they were not removed for this this clone
that just lost all its markers.
--------------------------------------------------------
FPC 4.7 30July00
1. On Project window:
A. On Menu window, there are two new options:
a. Re-number contigs by framework order.
i. All remarks following the :: (if exists) are removed and
the old contig number is put in the contig remark field.
ii.Your will be prompt as to whether you want to move the
remaining contigs to the lowest numbers; you might as well
do this as this cleans up your database.
It also puts Dead contigs on the end.
b. Set left end of each contig to zero
B. The framework window of the Project window now shows the contig
with the most clone hits first in the list of contigs for a clone.
For this contig, the difference in position within the contig with
the previous marker is shown.
NB To correctly associate this with the global position, that
difference has been changed to work be in relation to the previously
marker (it was the next marker).
C. On the pull-down window is a "Print to file" where the contents of
any Project window will be printed to the file of choice.
D. Slight additions & adjustments to the Summary window.
2. It was the case the when a file of markers was used to update fpc
using 'Merge .ace file', the 'framework' file, and 'sequencenew.ace'
where also read. Additionally, if the first line of the marker file
was // SAM, then it just added the markers, otherwise, and does a
complete update (deleting markers in fpc so that it is consistent with
the input file). Under File... is now the following:
Replace markers (fw & seq) // Original Merge .ace file
// see p.37 of manual
Replace framework // Same as read from Merge .ace, but
prompt for filename
Replace sequence status // Same as read from Merge .ace, but
prompt for filename. Format is not documented
Send me mail if desired to use this.
Merge markers // previous option when 'SAM' was on first line of
of the marker file. see p. 38 of manual
This can be produced by SAM using the Misc -
Save as fpc
Merge remarks // File looking like an fpc file but with only
clones and remarks.
This originally was put together in haste, and has now been changed
in the same manner. If you have problems, let me know, I'll help you
get your markers read in.
Bug fix: on replacing markers, if all the clones markers were removed
such that it was not in the ace file, its markers were not removed in fpc.
3. Add The marker type "SNP"
4. Speedup on reading in FPC files with marker.
5. On the Trail..., pull down and select the 'Trail Markers', when a
marker is selected, the marker and its clones will be highlighted in
3 rotating colours. Shared clones are in a fourth colour.
6. When computing the Sulston score for the types of data tradionally used
with FPC, empirical results showed that adaquate precisions was computed
in 3 times thru the loop versus the full N times, and takes almost 1/2
the time. Michele Morgante of Dupont has recently tried using
FPC for clones with over 100 bands (using the concatenation of 3 digest),
and the shortcut does not work for these large numbers.
So I've added the choice of the 'pure sulston' on the Configure window
which goes thru the loop the full N times. Also, if 'fast sulston' is
being used and the number of bands is over 60 for either clone being
compared, it automatically executes the pure (this is being highly
conservative).
Comparing the output of running with the 'Fast Sulston' version and
'Pure Sulston' version, the results are as follows when used with the data
from Chromosome 9 which uses a mix of GSC and Sanger clones:
Pure Time User 115 min 33 sec, System 0 min 1 sec
Contig sizes: Max 500, 128 (>25), 133 (25:10), 503 (9:3), 718 (=2),
5805 Singles
Fast Time User 68 min 9 sec, System 0 min 1 sec
Contig sizes: Max 500, 128 (>25), 133 (25:10), 503 (9:3), 718 (=2),
5805 Singles
Note: if your using multiple digests and want to change the gel length
which is used in the equation, change GEL_LEN in clam/clam.h.
7. Buried clones
a. Clones are unburied when moved to singletons
b. The Adjust flag was removed, i.e. a buried clone is always
moved under parent.
c. On Project/Menu, a SweepChildUnderParent fixes the above two,
i.e. no buried in singletons, and children must be under parents.
d. The OKALL on the CBmap insure the exact buried are exactly under
parents and approximately are within the Bury fraction.
8. I left on a debugging switch which could cause some esoteric output to screen
when running the CB map stuff.
-----------------------------------
FPC 4.6.4 4jan00
1. CB routine - fix a few little bugs.
2. Gel Image will not keep popping up.
3. Search on Gel Id, if a '*' is at the end, it will
look for the substring anywhere in the gel name.
4. It will not let you use a cutoff > 1 (which
screws up the CpM table).
5. Year 2000 works, but wasn't printing so nicely
in the windows.
6. Bug fix in Ends->Ends and Ctg->Ends; for short
contigs, the wrong orientation was often given.
FPC v4.6.1
Addition, changes, && bug fixes:
1. The saving of the CpM table was sloppy. I re-did it.
a. The 'Permanent' is removed, and instead, two sets of values
are saved in the fpc file: Current CpM and Build CpM.
b. There are more checks to insure that you run with the same values
everytime. If the Current CpM is different from the Build CpM
values, it will ask you:
Use Last Build CpM table?
If you say yes, it sets the values in the Current CpM to the Build CpM
and continues.
Else, it asks:
Use Current CpM table??
If you say yes, it sets the Last Build CpM to the Current CpM continues.
Else it quits the incremental Build.
c. The Auto Adjust works more intelligently.
2. a. When Trace was on, if the size of contig remark went over the
limit, it could cause a core dump.
b. The Trace would always turn back on when the a file is reloaded.
3. a. During IBC, if a contig was set to NoCB, a merged contig could
end up far away. Now they have a given a 10 spacing.
b. When recomputing the coordinates (ie. CBmap) it would unbury all
clones. This ended up changing the modified date, so now, clones
are not unburied first. Occasionally now, it finds another clone
to bury which will change the modified date.
4. Small speedup on Update .cor.
5. If you use a Framework file, it now sets all markers as default
to 'not a anchor' and then only sets the markers in the Framework
file to 'anchor'.
6. On the Project Menu window, there is now a option to search for
'Word in Remark'. Enter a word and it will list all contigs first
that have that word in the remark.
7. CB map had two bug that rarely occurs but can produce wrong map.
8. Find Orphan: on the bottom of the Contig Analysis. It highlights
all clones that have no overlap with any other clone according
to the cutoff and CpM table (if on).
9. Check for disconnected contigs: On the bottom of the
Project Menu window. This puts a 'DIS' in the remark field for
all contigs that are disconnected. That is, the CB algorithm
insures that a contig is a transitively overlapping set. Via
user merges based on less stringent cutoff, a contig may be
disconnected such that they assemble into 2 or more CBmaps or
have one or more Ignored clones (aka Orphas).
Orphans and disconnected contigs can also result from a new
gel added to an existing clone, and the gel is sufficiently
different from the original as to changes its cutoff with others.
------------------------------------------------------
FPC 4.6 20Sept99
>> Gel Image
Band - turn this on to get the for the GSC calculator.
That is, when its red, any band you select will be added in the calculator.
Clip on gel image.
To change the clipping, enter the values (and hit CR after each)
then select Clip.
In, Out: the number beside Out indicates the zoom level
When zoom is 1, you are seeing all the gel pixels I get from Image.
The whole displays it with zoom equal 6.
>> Other Variables
On the Main and Contig analysis window and a new subwindow
called "Other variables", which contains the following:
1. Rates: Min and Max.
FPC will ignore values below and above these values for
everything but the FP and Gel display.
NB with the Humanmap data, GSC suggest not using above 3590.
2. Trace (this is also on the Project menu).
Adds to the contig remark (after any exising ::) information
about merges, moves, splits, undos and incremental builds.
NB This information can be removed from all remarks
via the Project menu window.
3. CpM table (as follows).
>> CpM table (Cutoff plus Markers)
Allows the marker data to be used in conjunction with the cutoff between
two clones. If you have no markers, just set "Use CpM" to off (on the
Main or Contig Analysis windows), and all will work as usual.
If your cutoff is 1e-10, by default, the entries in the table will by default
be:
Markers cutoff
>=1 < 1e-08
>=2 < 1e-07
>=3 < 1e-06
Two clones will be said to overlap if they obey any of these rules,
e.g. if two clones share two or more markers and a cutof < 1e-07, they
are overlap.
Permanent: If you change the values and want them to be stored in the
fpc file on the next save, select Permanent.
Auto Adjust: if you change the cutoff, the values in the table automatically
changes to keep the same relation as is in the example above.
+1 and -1: add or subtract one from all exponents.
Use PCR and Use YBP (YAC/BAC/PAC) markers: these markers will be considered
only if there corresponding flag is on. Note: it does not matter what
markers are displayed (according to Configure).
When the "Other Variables" is initiated via the Contig Analysis window,
the Step functions can be used to show pairs of clones that match based
on the corrsponding rule.
If "Use CpM" is on, the CpM table is used with ALL comparisions (e.g
Ends --> Ends, etc).
If the CpM table is being used, the -->Ctg Prob is -->Ctg CpM,
ie. the highlighted clones will be the one match by the cutoff or the table.
>> Merge ace file on the File... and Incremental Build
Incremental Build now takes into account new markers.
NB The Sanger Centre uses Merge ace file to enter marker data
(so this feature is supported with Merge ace file but not the Marker editor).
See p.37 of FPC V4: User's Manual for details of Merge ace file.
When "Merge ace file" is executed, each new clone marker get marked as New and
this information is saved in the .fpc file.
You will see the 'New' status on the Clone Text window.
The Incremental Build considers all singletons with a create date after
"Last Build". If Use CpM is on, it also considers all clones that have 'new'
markers (these can be singletons or in a contig).
The New will get set to CpM on next Incremental Build,
and on the subsequent one, set to no value.
For ShowAddition, a clone that merges two contigs based on
shared markers is shown in light blue.
>> Contig Analysis
If there is a /Sizes directory, an option
to use the sizes it now beside the [--> Clone1] button.
UndoAdditions undoes the additions shown by ShowAdditions
-------------------
Main Analysis: the AutoAdd works with Singles->Ctg now.
On the pull-down for a Contig is "Legend" which describes all the
various colours I use
ShowIncBuild is renamed ShowAddition since it works for any added clones.
Project Window:
There is a set of commands to clear modified dates, new marker status,
oldctg values, and remarks.
When Trace is on, contigs merged via IBC (Incremental Build Contigs) are
commented in the contig remark field, e.g. IBC5 (ctg5 was merged).
These are cleared on each new IBC so you only see the most recent.
Under File... is a Merge GSC file. This reads an FPC style file with
only clone with map coordinates and Fpc_remarks. The source is in
file/gsc.c and can be changed for local needs.
PLEASE READ:
The addition of Incremental Build to FPC has introduced 'state'.
Each time it is run, the following states are cleared:
1. Oldctg for each clone.
2. New for each new markers.
3. IBC traces are removed from contig remark.
Therefore, ShowAdditions and the contig remark are relevant only
to the latest IBC.
ShowAdditions and UndoAdditions also work for clones that are added via
AutoAdd, etc. and contigs merged by the user. If Incremental Build is never
used, the values are never cleared. In which case, its important you do so
manually through the Project Menu items:
Clear clone Oldctg
Clear markers new
Clear contig remark after ::
Otherwise, when you do a ShowAdditions it will show everything added since
v4.5 to a contig, which becomes meaningless after a while.
I strongly recommend you Incremental Build if you are incremently adding
clones to your database!
----------------------------------
FPC V4.5 28 Jul 99
1. New Incremental Build
This is on the Main Analysis window. It automatically adds clones and merges
contigs using the newest added clones to the database. This is much much
faster then killing all contigs and rebuilding, plus it uses very little
memory. If you split or merge contigs, they will remain as such unless
a new clone recombines two contigs.
This function takes all the clones added after the Last Build date.
For each new clone it (1) Compares it against all contigs
and joins all contigs which it hits. The list of merged contig
is displayed on the project window, and in the log file if Log
is still turned on. (2) Compares the clone with all old singletons.
a. If the new single has been added to a contig and it matches an old single,
The old single is added to the contig.
If NoCB is set, the old single is buried in the new single.
b. Else a new contig is created.
The Last Build date and cutoff are saved with the fpc file.
These can be changed on the Main analysis window.
Since CBmaps are constantly be run now and the clones instantiated,
the Modified Date for a clone is not longer altered by a CBmap instantiation.
It is only changed when a singleton is added by Incremental Build or from
a user's edit.
Additional changes have been made to support IBC, see 2, 3 and 4f.
2. Contigs Records -
Now a contig has a date, status, the number of Qs and a remark.
STATUS: The status can be changed from the Project window using the
Project Menu button.
1. NoCB - do adds and merges on Incremental but do not compute CBmaps
2. NoSeq - do not write to ACEDB files. (see File...)
3. Avoid - will be avoided in Build Contigs and Incremental
4. Dead - do not write to ACEDB files and leave out of builds
and the project summary.
Contigs with more than one sequenced clone are automatically set to NoCb,
this can be ignored by selecting the 'Use seq ctgs' on the Main Analysis.
When two contigs are merged, the highest type status is used (referring to
the number above).
DATE: is changed when anything in the contig is changed.
Moving all clones in a contig from one contig number to another does not
effect the data, e.g. 'Move all clones to lower numbers'
Qs: is set by the CB algorithm. Any changes such as merges or moving clones
sets it to '-' as its no longer known.
REMARK: On the contig window is a green line above the blue message containing
the contig remark. The remark is also shown on the project window. The maximum
size remark is 80 characters. The remarks can be changed by any of the
following:
1. On the Project menu, set 'Trace Merge, Move and Split'. Any of these
operations will get concatenated on the front of the remark.
Also, the Auto Add will add a remark.
If Trace is on, new contents are appended to the front and if its over
80 characters, the end is truncated.
2. Write and edit the remark on the contig window.
This may seem strange at first, in that what ever you do to the
remark in the contig window is automatically changed, you doen't
Accept or anything like that.
3. If you want to add a remark that stays at the beginning
when Trace is on, follow the remark by '::'.
E.g if a contig has the remark:
Sequenced by GSC::
and you merge contig 5 with it when Trace is on, the remark will be:
Sequenced by GSC:: Merge Ctg5
4. If contigs are merge, their remarks are merged appropriately.
If a contig is split, the new contig will not have a remark.
3. OldCtg
With each clone, the last contig it was in is now shown on the Clone Text
window as the Oldctg. All clones get their Oldctg set to the current contig
on a Build and Incremental build.
This is mainly for the Incremental build, see 4f.
Naturally, any reference to a deleted contig (e.g. Merge) can become out of
date, i.e. the contig number can be reused and hence, references to oldctg
or in the contig remark can become out-of-date. See 5d and 5e.
4. Contig Analysis
a. --> Ctg Prob now shows the best match by the Sulston score in magenta.
b. CtgCheck is a new check which I got out of the GSC arabidopsis paper
which highlights any clones where the best match is not immediately
to the left or to the right of it.
c. The NoOlap and BadOlap have not changed, except now they are two separate
buttons.
d. Beside the CtgCheck and the NoOlap are two new buttons, each called Step;
you can step through the bad clones using these functions.
I find the Step for NoOlap very useful in determining if the OKALL has
positioned CBmaps in the best order.
e. The Marker evaluations are as before but now they are three separate
buttons.
f. The ShowIncBuild shows clones with an Oldctg of 0 in purple, and
clones with a Oldctg other than the current ctg in lightpurple. This is
good to use after the Incremental Build to see changes.
5. Project Menu
a. Setting the status as described in 2.
b. You can have the contigs renumbered from 1 to N where
N is the number of contigs.
c. Making the Results field in the Project window be the list of anchors,
markers, sequenced clones, or contig remarks.
d. Resetting Oldctg sets it equal to ctg.
e. Clearing the contig remark fields after the ::, or if there
is no ::, the whole remark will be cleared.
6. BUG FIX: The OKALL had quite a few serious bugs which have been fixed.
Plus, I had only tested it on good data. I have recently been refining
the heuristics based on a real set of clones, i.e. it works much
better. But sometimes, when there is so much ambiguity, it still may not
get it right. A good way to check is using the Step function of the
NoOlap with the cutoff equal to what it was using to compare ends.
7. On File..., there is a new option called 'Save FPC as ..'
in which you can save it into a new name and it copies the .cor file
to one with the new name also, e.g. if you specify the name
'new', it creates a new.fpc and new.cor just like the one you have
loaded into FPC.
8. The summary on the project window has been changed.
9. Ends -> Ends the output has changed on the Project window.
E.g. Ctg1 RR-5 ctg10
implies the right end of 10 matches with the right end of 1 and
there are 5 hits.
10 CBmap:
a. Potential buries were never marked as Q clones, this
has been changed, which means, there will be more Qs.
I've modified the rules for determining a Q clone:
After its build, the following is computed: Any clone
that has: extras+gaps > number-of-bands/2 is a Q clone.
b. After calculating the CBmap, it then tries to add the extras to
the ends of clones as sometimes the swaying of the average can
cause some extras to qualify after its built; it will allow
one zero when extenting, which overall causes a few more zeros
in the map. And this probably sounds like gibberish.
c. On CBmap display, under ?? is a ShowQs
d. Clicking with the middle button moves the clicked clone to
the middle (as does with the contig display).
11. The Rule Menu has been renamed 'The Attic'; pull-down in whitespace from
either the Main Analysis or Contig Analysis window. The Single/Okall/View
button have been moved to the Attic.
12. The Kill function which destroys contigs automatically unburies clones.
13. Ctg->Ends is sorted by contig.
14. The Calc routine can be run when two contigs are displayed together
for a potential merge.
15. Default offset when merging contigs is 1 (else the end points are shared
hence, overlap, so not detected by NoOlap).
-----------------------------
The complete release notes are in ftp.sanger.ac.uk/pub/fpc/release-notes.
I edited the following so that it is just the new features; ie. the
features that are different from what is in the manual (which I really
need to update).
Release V4.1 - V4.3
>> Project window
A. A green scrolling bar has been added. It pages 1/2 page at a time.
B. The Results are saved during a session and can be displayed at any
time by picking the 'Results' option. (They are losted when you exit).
C. The project window is never repositioned based on a new current contig
or marker. To position on the current contig or marker (for framework and
end sequence options), use the 'Goto Current' on the pull-down menu.
D. If Tolerance is set to variable, on the summary window, you get
average band size and average clone size.
>> Using Image size or band files for agarose gels:
a. The Sanger Centre Image program produces both *.bands file and *.size files
for agarose gels. If *.size files are in the Image directory,
they will be read and moved to the Sizes directory (will create if
necessary), if the files are *.bands files, they will be moved to the
Bands directory as usual.
b. If variable tolerance (ie. size files),
when the Gel Image is displayed, the migration
rates will be read from the Bands files and displayed as they correspond
with the Gel image.
c. If fixed tolerance (ie. band files)
The Calulator provided by GSC looks in the Sizes directory for the size
files (along with all the other optional directory paths).
The Contig Analysis routine, --> Clone1, which compares Clone 1 and Clone 2,
checks for a /Sizes directory, and if it exists with the gel and clone, it
uses that data, in which case you get the size of each clone and the size of
the overlap. It uses 0.001 * tolerance, and prints this value out, therefore,
this indicates when /Sizes have been used.
NB We use the bands files, so that code is well tested.
>> Search:
I've made the search routines act more like they do in acedb.
a. The search string never clears unless you change
class and the searchtype no longer applys.
b. One click on a class makes it the current class.
If you enter a search string or click Search Commands,
it applies to the current class.
c. A second click executes the search on the
search string. A blank or a '*' match everything in
the class.
d. Clear clears the search string text box (though it
does not remove the current keyset as does acedb).
Non-acedb features:
e. A new feature, Reset, regenerate the entire class.
g. Unlike acedb, it never changes the search string; e.g. in
acedb, if a search fails, it appends a '*'.
As always, it does the search on the current keyset.
It NO longer automatically updates a keyset.
>> Main Analysis:
Ends --> Ends
(1) FromEnd specifies how far from the end a clone can be
to qualify as a end clone.
(2) Use Markers - when this is on, two clones are said to
overlap either due to having a cutoff beneath the specified
one OR because they share one or more markers.
This can actually lead to a lot of false joins, it needs more
work.
Auto Add: when on, clones will automatically be added to contigs
when you execute Keyset->FPC.
>> Contig Analysis:
--> Ctg Prob (was -->Ctg)
a. for a high Prob and 1 or more markers - color Purple
b. for a high prob only, color Violet
--> Ctg Markers is a new function which highlighted clones containing the
one or more marker shared with Clone 1:
a. for a markers and high prob - color purple
b. for markers only color violet
So a clone will have the same purple set for ->Ctg Prob and ->Ctg Markers
but different violet set. The text output has changed alittle.
Use Markers applies to the following 3 functions.
and FromEnd applies to the 2nd 2 functions.
Clone 1 -> Fpc will include checking all clones for markers.
Ctg --> Ends will give the same results as from Ends --> Ends but
for this contig only and potential joining clones will be highlighted
Sel --> Ends uses the selected set and compares them against all
ends of other contigs.
Show Sequenced has an on/off circle beside it.
When on: if a clone is not selected or the highlighted clone,
and if it is picked for sequencing (anything byt CANCELLED) it
will be shown in its sequenced colour. They remain coloured, even
a ClearAll will not remove their colour, only turning the circle
off will stop the sequence colouring.
Show Sequenced works as always.
>> CBmap
OKALL Markers that are not displayed are not used if Use Markers is on.
The Use Markers and FromEnd are the same parameters found on the
Main analysis and Contig analysis windows.
CBmap. The final statistics are now written to the log file if it
is turned on. They are always still written to stdout.
Fp_order and Gel_order, OKALL automatically uses 'Left End' for these
two options.
NB The difference between using the Left End button and the OKALL is that
the first only positions the clones in the CBmap, whereas OKALL positions
them and buries all clones not in the CBmap.
>> Marker text:
On a Marker Text window pull-down menu, there are two new options:
Add to Fp and Add to Gel, which will add the fingerprints for all
clones scoring positive for the marker to the Fp/Gel window.
>> New Shotgun Type {Full-X, Half-X, Gap-closure} has been implemented.
What was referred to as 'Sequence' I'm now referring to as
'Shotgun Status'. So a clone has a Type and a Status.
When the Type is set, the Status is automatically set to TILE
The type along with status is shown per contig on the Project
window 'By sequence'. The number of Sequenced is everything with a
Status except AVOID and CANCELLED.
The Submit for Sequencing and Save as Ace have been altered to include
the new Type.
Full_X, Half_X, and Gap are added to the Search Commands for clones.
>> Size information can be added to the header. e.g.
// Framework Chr_1 Genome 263000000 AvgBand 3000 AvgInsert 100000
The name after Framework is used for the Ace Dumps under File...
and is probably not relevant to most labs and can be left out.
The rest of the numbers result in Coverage given under Summary in
the Project window and the AvgBand is also used in the Sort by Length.
>> Fingerprint: It only shows the contig-per-clone if requested.
Or you can show the gel index (i.e. number 1, 2, etc) which is nice if
your displaying multiple gels for a clone.
>> Main Configure:
There is a new option on the Configuration windows to not show
markers that have only scored positively with 'one' clone.
NB Your current contigs will initially come up with them not displayed;
use the Main window configuration to change them all at once.
New contigs will by default have them displayed.
==========================================================
FPC Release V4
CHANGES:
>> Year 2000 compliant. The Modified Date is set to current date (i.e. it is
never 0). Pre V4 fpc files are correctly updated.
>> Message window: has been removed. All output goes to standard output
(i.e. the terminal window). Two reasons for this: Clone names can be
grabbed with the mouse from terminal windows (but not the message
window) and dropped in yellow fpc boxes, and to reduce number of windows
and provide consistency in output.
>>Highlighting:
I tried to fix a few bugs and realized the whole coloring was flaky and so
I rewrote it all. These are the rules:
1. CYAN = highlighted marker or clone.
DARKGREEN = friend
LIGHTGREEN = parent of buried (hidden) friend
LIGHTBLUE = selected clone
LIGHTRED or RED = warning
PURPLE = analysis highlightening
LIGHTPURPLE = parent of buried (hidden) highlighted by analysis
Sequenced clone status:
TILE GREEN
SENT VIOLET (changed from LIGHTBLUE to not clash with Selected)
READY BLUE
SHOTGUN LIGHTGRAY
FINISHED RED
AVOID LIGHTRED
CANCELLED BLACK
2. ClearAll clears all highlighting and selected states.
Clear on Edit Contig clears all selected states.
When Trail is on, highlighting is never cleared until a ClearAll
3. For the Contig Analysis functions that change the colour of
clones for any reason, ClearAll will first be executed. For
some functions, the highlighted clone will remain.
*The coloring from these routines is temporary, i.e. if they are
picked such their color changes, they go back to white when some
other clone is picked.
*Some of the functions used to set clones to be 'selected'; now the
colour PURPLE is used instead. If you want to select the set, use
the Contig Analysis 'Select Coloured'. Doing this will prevent them
from 'losing' their colour when picked.
4. Picking a clone on any of the maps, i.e contig, fingerprint, gel and cbmap
will automatically cause the clone to be highlighted on all of the
displayed maps.
*If a marker is selected in the contig window, a highlighted clone will
stay highlight in the fingerprint or gel window (i.e. the windows become
temporary out of sync).
5. When one or more clones are added to the fingerprint or gel map:
if there is a highlighted clone, the clone(s) is added to the left of it,
else they are added at the end. (1) It is not obvious which is the
highlighted clone if trail is on. (2) To have no current highlighted clone,
use Clearall.
----------------------
References
C. Soderlund (1999) FPC V4.0: User's Manual. Technical Report SC-01
-99. The Sanger Centre, Hinxton Hall, Cambridge UK.