[BioC] flowCore: inverse logicle transformation of flow cytometry data
Josef Spidlen
jspidlen at bccrc.ca
Thu Oct 8 01:59:20 CEST 2009
Hi Nishant, Chao-Jen, Pyne, et al.
I thought I would add few notes on the Logicle transformation and its
implementation in BioConductor/flowCore. Please do not take the email
the wrong way, I am definitely not trying to complain about flowCore's
implementation; just trying to bring some light into the Logicle issues.
The implementation of Logicle is tricky, which is due to both, the
transformation being quite complicated and the original documentation
being targeted to readers with substantial mathematical background.
Nishant, Florian, Byron and others did a great job when they opened the
can of worms and implemented the transformation in flowCore.
1) I remember that there were minor issues with the Logicle
transformation, especially when applied on very low and negative values.
I believe, flowCore's implementation wasn't actually a monotone function
around 0 and it probably relates to a typo in the Parks et. al. Logicle
manuscript. Nishant, we had some email discussion on this topic around
January 27, 2009. Recently, I talked to Dave Parks and Wayne Moore and
they told me that flowCore's implementation of Logicle is still broken
(no further details, not sure about validity of this statement and not
sure if this is related to the original issue). Having said that, please
note that these issues are minor and would not really affect typical
users using flowCore/Logicle as part of their analysis pipeline.
2) Quoting Chao-Jen Wong:
> Since the logicle transformation is an one-to-one and onto function,
> it is possible to implement an inverse function. It is, however, not
> straightforward...
I believe Nishant has solved this by now but note that the inverse
should actually be easier than the Logicle itself. This is because the
Logicle transformation is defined as logicle(x)=root(S(y)-x), i.e.,
Logicle is defined as the inverse of S, where S is a “simple” function.
Eventually, you can just use S as the inverse of Logicle.
3) Recently, there have been some development related to Logicle that
flowCore may want to consider/support/implement:
- Since 2005, there is a patent on the Logicle implementation owned by
Stanford. However, early this year, Stanford decided not to collect
royalties on it anymore and it became free to be used by anyone.
- As a result, Logicle has been incorporated in the latest version of
Gating-ML (an ISAC standard for describing gates and data
transformations in XML). This has been done with collaboration of the
authors of the transformation, who decided to tweak the transformation a
little bit:
a) In the original manuscript, Parks et. al. are showing two different
parameterizations of the Logicle function based on natural logarithm
(base e) and decadic logarithm (base 10) respectively. The recent
conclusion is that the decadic (base 10) version is better (i.e.,
easier) for the end user and should be used. Essentially, the
transformation function is the same as long as you adjust the parameters
accordingly. In the manuscript, the parameters are lower case for
natural logarithm parameterization and upper case for the base 10
logarithm. The 'm' and 'w' are the two affected parameters where
m=M*ln(10) and w=W*ln(10).** The "upper case" (i.e., decadic version) is
better for the end user since M and W are expressed in normal decades,
i.e., base 10 log units; M is the total plot width and W is the
linearization width. Therefore, let's say the user wants the result to
be a 4.5 decades plot, so they use M=4.5 rather than having to use m=10.36.
The implementation in flowCore seems to be based on the natural
logarithm but its parameterization is mixed ('w' seems to be in natural
logarithm decades, while 'd' seems to be decadic logarithm decades).
Eventually, flowCore could switch to the decadic logarithm
implementation and harmonize the parameterization, ... and maybe use the
same constants as in the paper?).
** I believe that flowCore calls 'd' what is called 'M' in the
manuscript and 'r' what is called 'T' in the manuscript (Parks, et. al.,
Cytometry, 69A: 541-551; 2006).
b) The authors of Logicle added one additional parameter to the Logicle
function: 'A' - the additional negative display range in asymptotic
decades (usually 0 or a negative value). Setting it to 0 produces the
“original” Logicle. In cases where low data values are dominated by
statistical variation but the values are constrained to be non-negative
(as seen in peak detected flow cytometry data), a Logicle plot with A =
-W would include data zero and be near-linear at low data values thereby
avoiding problems associated with log scales at the low end.
4) If you decided to adjust the implementation of Logicle in flowCore, a
consistent description with (hopefully) all necessary details is
included in the latest Gating-ML specification, which can be downloaded
from http://flowcyt.sourceforge.net/gating/latest.full.zip.
5) The latest Gating-ML specification also includes compliance tests,
which include the Logicle transformation. This may eventually help you
adjust/debug the implementation of Logicle (as well as its inverse
function) in flowCore.
6) I have some Java code that implements the updated Logicle.
Specifically, I have the Logicle(T, M, W, A) class that allows you to
create and apply the Logicle transformation; and I also have some code
that calculates default values for T, M, W, and A based on the contents
of an FCS file. Please let me know if you would like me to share these.
I am not suggesting that you would reuse the implementation directly
since it is quite naive and relatively slow (using a simple bisection
method as a root-finding algorithm every time you call it); however, it
may have some value in clarifying potential ambiguities related to that
function. The crucial part is the updated S function that now includes
the parameter 'A' and works with decadic parameterization (see Gating-ML
for details). However, since flowCore's internal implementation seems to
be based on the “bi-exponential” like parameterization, i.e., a*e^(b*x)
- c*e^(-d*x) + f, it may involve some effort to convert this correctly.
7) A minor note on flowCore's defaults for the logicle transformation:
r = 262144; This works for data from BD's newer instrument since their
range is 2^18; however, there is a lot of other FCS files with different
max (e.g., 10^4), where the 262144 is not very good. An option would be
to have r=NULL as the default and adjust it based on the data that the
transformation is supposed to be applied to.
d = 5; Parks et. al. are suggesting to use 4.5 but this does not really
make a difference. More importantly, there seems to be a typo in the
documentation of the function saying that d is the breath of the display
in natural logarithm units. The code includes d <- d * log(10) and
therefore, the documentation should probably say that d is the breath of
the display in decades (i.e., decadic logarithm units). Also, Nishant,
shouldn't the if (w > d) stop(...) in the logicleTransform function go
after the d <- d * log(10)?
w = 0; This does not perform very well if you have low and negative
values to look at. Alternatively, you could have w=NULL as default and
create the real value based on the data set. A recommended way to
specify W to match particular data is to select a value 'z'
approximating the most negative data value that must be included and
calculate W as: W = (M – log(T/abs(z)))/2. Setting 'z' at the fifth
percentile of events that are below zero will yield an appropriate
display in most cases.
Please let me know if I could do anything to help or clarify things further.
Cheers,
Josef
--
Josef Spidlen, Ph.D.
Research Associate, Terry Fox Laboratory, BC Cancer Agency
675 West 10th Avenue, V5Z 1L3 Vancouver, BC, Canada
Tel: +1 (604) 675-8000 x 7755
More information about the Bioconductor
mailing list