[Bioc-sig-seq] Scatter Plot Matrix
Martin Morgan
mtmorgan at fhcrc.org
Thu Nov 6 19:34:36 CET 2008
Victor Ruotti wrote:
>>
> Hi Martin
> I'm having trouble interpreting the Scatter Plot Matrix listed in the
> ShortRead pdf file.
> Can you comment on this?
Hi Victor --
Not sure that the figure made it through the mailing list; it's on p.
13, just before section 2.2, of the 'Overview' vignette.
The figure was meant to be a 'teaser' to encourage people to explore and
understand the way the data is collected. It shows the intensities
recorded at an early stage in the eland pipeline, after image
acquisition but before base calling. The axes are the intensities
reported by Firecrest. Each panel represents a pairwise comparison
between two bases. Each point in a panel represents a single cluster;
all clusters are represented in each panel. The displayed figure is at
cycle 2. The suggestion in the text is to compare this to a later cycle
(e.g., 30); it's fun to make a little animation of this, e.g., by
looping over cycles.
As I understand it, there are two (not four, as one might think)
florescent nucleotides ('dATP', 'dCTP'), and each is measured on two
different wavelengths ('red', 'green'). This is from here
http://dx.doi.org/10.1016/S0003-2697(03)00291-4
Roughly, there are four different intensities ('dATP, red', 'dATP,
green', 'dCTP, red', 'dCTP, green'). The mapping between intensity and
underlying base is orthogonal along one dimension (the panels comparing
A or C with G or T) but not along the other (the panels comparing A and
C, and G and T).
Ideally, one would like to see discrete groups of points, like in any
discriminant analysis. Even early in the cycle we see that this is not
clear 'by eye'. Points differ in amplitude (distance from origin,
related to number of DNA strands per cluster?) and deviate from the
horizontal, vertical, or diagonal ('phasing' or dye bleed, where each
cluster has florescence in all four dimensions resulting from the some
DNA strands being sequenced at the wrong location, or residual
flourescent bases from earlier reactions, for instance).
A picture from a later cycle shows that points collapse toward the
origin (lower intensity, e.g., because of depleted reagents or lost DNA
strands) and become less orthogonal (strands within a cluster
increasingly out of phase, dye build-up from previous reactions, etc).
Mostly the figure shows the challenges faced by base calling algorithms,
and the challenges for the technology (making the clusters fall out more
discretely, for more cycles).
This is my understanding of the chemistry involved; perhaps others will
contribute more authoritatively.
Martin
> THanks,
> Victor
>
>
> ------------------------------------------------------------------------
>
>
>
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list