[Bioc-sig-seq] Scatter Plot Matrix

Thu Nov 6 19:34:36 CET 2008

Victor Ruotti wrote:
>>
> Hi Martin
> I'm having trouble interpreting the Scatter Plot Matrix listed in the 
> ShortRead pdf file.
> Can you comment on this?

Hi Victor --

Not sure that the figure made it through the mailing list; it's on p. 
13, just before section 2.2, of the 'Overview' vignette.

The figure was meant to be a 'teaser' to encourage people to explore and 
understand the way the data is collected. It shows the intensities 
recorded at an early stage in the eland pipeline, after image 
acquisition but before base calling. The axes are the intensities 
reported by Firecrest. Each panel represents a pairwise comparison 
between two bases. Each point in a panel represents a single cluster; 
all clusters are represented in each panel. The displayed figure is at 
cycle 2. The suggestion in the text is to compare this to a later cycle 
(e.g., 30); it's fun to make a little animation of this, e.g., by 
looping over cycles.

As I understand it, there are two (not four, as one might think) 
florescent nucleotides ('dATP', 'dCTP'), and each is measured on two 
different wavelengths ('red', 'green'). This is from here

http://dx.doi.org/10.1016/S0003-2697(03)00291-4

Roughly, there are four different intensities ('dATP, red', 'dATP, 
green', 'dCTP, red', 'dCTP, green'). The mapping between intensity and 
underlying base is orthogonal along one dimension (the panels comparing 
A or C with G or T) but not along the other (the panels comparing A and 
C, and G and T).

Ideally, one would like to see discrete groups of points, like in any 
discriminant analysis. Even early in the cycle we see that this is not 
clear 'by eye'. Points differ in amplitude (distance from origin, 
related to number of DNA strands per cluster?) and deviate from the 
horizontal, vertical, or diagonal ('phasing' or dye bleed, where each 
cluster has florescence in all four dimensions resulting from the some 
DNA strands being sequenced at the wrong location, or residual 
flourescent bases from earlier reactions, for instance).

A picture from a later cycle shows that points collapse toward the 
origin (lower intensity, e.g., because of depleted reagents or lost DNA 
strands) and become less orthogonal (strands within a cluster 
increasingly out of phase, dye build-up from previous reactions, etc).

Mostly the figure shows the challenges faced by base calling algorithms, 
and the challenges for the technology (making the clusters fall out more 
discretely, for more cycles).

This is my understanding of the chemistry involved; perhaps others will 
contribute more authoritatively.

Martin

> THanks,
> Victor
> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> 

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793