[BioC] flowCore: inverse logicle transformation of flow cytometry data

Thu Oct 8 01:59:20 CEST 2009

Hi Nishant, Chao-Jen, Pyne, et al.

I thought I would add few notes on the Logicle transformation and its 
implementation in BioConductor/flowCore. Please do not take the email 
the wrong way, I am definitely not trying to complain about flowCore's 
implementation; just trying to bring some light into the Logicle issues. 
The implementation of Logicle is tricky, which is due to both, the 
transformation being quite complicated and the original documentation 
being targeted to readers with substantial mathematical background. 
Nishant, Florian, Byron and others did a great job when they opened the 
can of worms and implemented the transformation in flowCore.

1) I remember that there were minor issues with the Logicle 
transformation, especially when applied on very low and negative values. 
I believe, flowCore's implementation wasn't actually a monotone function 
around 0 and it probably relates to a typo in the Parks et. al. Logicle 
manuscript. Nishant, we had some email discussion on this topic around 
January 27, 2009. Recently, I talked to Dave Parks and Wayne Moore and 
they told me that flowCore's implementation of Logicle is still broken 
(no further details, not sure about validity of this statement and not 
sure if this is related to the original issue). Having said that, please 
note that these issues are minor and would not really affect typical 
users using flowCore/Logicle as part of their analysis pipeline.

2) Quoting Chao-Jen Wong:
> Since the logicle transformation is an one-to-one and onto function, 
> it is possible to implement an inverse function. It is, however, not 
> straightforward...
I believe Nishant has solved this by now but note that the inverse 
should actually be easier than the Logicle itself. This is because the 
Logicle transformation is defined as logicle(x)=root(S(y)-x), i.e., 
Logicle is defined as the inverse of S, where S is a “simple” function. 
Eventually, you can just use S as the inverse of Logicle.

3) Recently, there have been some development related to Logicle that 
flowCore may want to consider/support/implement:

- Since 2005, there is a patent on the Logicle implementation owned by 
Stanford. However, early this year, Stanford decided not to collect 
royalties on it anymore and it became free to be used by anyone.

- As a result, Logicle has been incorporated in the latest version of 
Gating-ML (an ISAC standard for describing gates and data 
transformations in XML). This has been done with collaboration of the 
authors of the transformation, who decided to tweak the transformation a 
little bit:

a) In the original manuscript, Parks et. al. are showing two different 
parameterizations of the Logicle function based on natural logarithm 
(base e) and decadic logarithm (base 10) respectively. The recent 
conclusion is that the decadic (base 10) version is better (i.e., 
easier) for the end user and should be used. Essentially, the 
transformation function is the same as long as you adjust the parameters 
accordingly. In the manuscript, the parameters are lower case for 
natural logarithm parameterization and upper case for the base 10 
logarithm. The 'm' and 'w' are the two affected parameters where 
m=M*ln(10) and w=W*ln(10).** The "upper case" (i.e., decadic version) is 
better for the end user since M and W are expressed in normal decades, 
i.e., base 10 log units; M is the total plot width and W is the 
linearization width. Therefore, let's say the user wants the result to 
be a 4.5 decades plot, so they use M=4.5 rather than having to use m=10.36.
The implementation in flowCore seems to be based on the natural 
logarithm but its parameterization is mixed ('w' seems to be in natural 
logarithm decades, while 'd' seems to be decadic logarithm decades). 
Eventually, flowCore could switch to the decadic logarithm 
implementation and harmonize the parameterization, ... and maybe use the 
same constants as in the paper?).
** I believe that flowCore calls 'd' what is called 'M' in the 
manuscript and 'r' what is called 'T' in the manuscript (Parks, et. al., 
Cytometry, 69A: 541-551; 2006).

b) The authors of Logicle added one additional parameter to the Logicle 
function: 'A' - the additional negative display range in asymptotic 
decades (usually 0 or a negative value). Setting it to 0 produces the 
“original” Logicle. In cases where low data values are dominated by 
statistical variation but the values are constrained to be non-negative 
(as seen in peak detected flow cytometry data), a Logicle plot with A = 
-W would include data zero and be near-linear at low data values thereby 
avoiding problems associated with log scales at the low end.

4) If you decided to adjust the implementation of Logicle in flowCore, a 
consistent description with (hopefully) all necessary details is 
included in the latest Gating-ML specification, which can be downloaded 
from http://flowcyt.sourceforge.net/gating/latest.full.zip.

5) The latest Gating-ML specification also includes compliance tests, 
which include the Logicle transformation. This may eventually help you 
adjust/debug the implementation of Logicle (as well as its inverse 
function) in flowCore.

6) I have some Java code that implements the updated Logicle. 
Specifically, I have the Logicle(T, M, W, A) class that allows you to 
create and apply the Logicle transformation; and I also have some code 
that calculates default values for T, M, W, and A based on the contents 
of an FCS file. Please let me know if you would like me to share these. 
I am not suggesting that you would reuse the implementation directly 
since it is quite naive and relatively slow (using a simple bisection 
method as a root-finding algorithm every time you call it); however, it 
may have some value in clarifying potential ambiguities related to that 
function. The crucial part is the updated S function that now includes 
the parameter 'A' and works with decadic parameterization (see Gating-ML 
for details). However, since flowCore's internal implementation seems to 
be based on the “bi-exponential” like parameterization, i.e., a*e^(b*x) 
- c*e^(-d*x) + f, it may involve some effort to convert this correctly.

7) A minor note on flowCore's defaults for the logicle transformation:

r = 262144; This works for data from BD's newer instrument since their 
range is 2^18; however, there is a lot of other FCS files with different 
max (e.g., 10^4), where the 262144 is not very good. An option would be 
to have r=NULL as the default and adjust it based on the data that the 
transformation is supposed to be applied to.

d = 5; Parks et. al. are suggesting to use 4.5 but this does not really 
make a difference. More importantly, there seems to be a typo in the 
documentation of the function saying that d is the breath of the display 
in natural logarithm units. The code includes d <- d * log(10) and 
therefore, the documentation should probably say that d is the breath of 
the display in decades (i.e., decadic logarithm units). Also, Nishant, 
shouldn't the if (w > d) stop(...) in the logicleTransform function go 
after the d <- d * log(10)?

w = 0; This does not perform very well if you have low and negative 
values to look at. Alternatively, you could have w=NULL as default and 
create the real value based on the data set. A recommended way to 
specify W to match particular data is to select a value 'z' 
approximating the most negative data value that must be included and 
calculate W as: W = (M – log(T/abs(z)))/2. Setting 'z' at the fifth 
percentile of events that are below zero will yield an appropriate 
display in most cases.

Please let me know if I could do anything to help or clarify things further.

Cheers,
Josef

-- 
Josef Spidlen, Ph.D.
Research Associate, Terry Fox Laboratory, BC Cancer Agency
675 West 10th Avenue, V5Z 1L3 Vancouver, BC, Canada
Tel: +1 (604) 675-8000 x 7755