[Rd] [RFC] A case for freezing CRAN
Martin Maechler
maechler at stat.math.ethz.ch
Mon Mar 24 11:28:54 CET 2014
>>>>> Hervé Pagès <hpages at fhcrc.org>
>>>>> on Thu, 20 Mar 2014 15:23:57 -0700 writes:
> On 03/20/2014 01:28 PM, Ted Byers wrote:
>> On Thu, Mar 20, 2014 at 3:14 PM, Hervé Pagès
>> <hpages at fhcrc.org <mailto:hpages at fhcrc.org>> wrote:
>>
>> On 03/20/2014 03:52 AM, Duncan Murdoch wrote:
>>
>> On 14-03-20 2:15 AM, Dan Tenenbaum wrote:
>>
>>
>>
>> ----- Original Message -----
>>
>> From: "David Winsemius" <dwinsemius at comcast.net
>> <mailto:dwinsemius at comcast.net>> To: "Jeroen Ooms"
>> <jeroen.ooms at stat.ucla.edu
>> <mailto:jeroen.ooms at stat.ucla.edu>> Cc: "r-devel"
>> <r-devel at r-project.org <mailto:r-devel at r-project.org>>
>> Sent: Wednesday, March 19, 2014 11:03:32 PM Subject: Re:
>> [Rd] [RFC] A case for freezing CRAN
>>
>>
>> On Mar 19, 2014, at 7:45 PM, Jeroen Ooms wrote:
>>
>> On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt
>> <michael.weylandt at gmail.com
>> <mailto:michael.weylandt at gmail.com>> wrote:
>>
>> Reading this thread again, is it a fair summary of your
>> position to say "reproducibility by default is more
>> important than giving users access to the newest bug
>> fixes and features by default?" It's certainly arguable,
>> but I'm not sure I'm convinced: I'd imagine that the
>> ratio of new work being done vs reproductions is rather
>> high and the current setup optimizes for that already.
>>
>>
>> I think that separating development from released
>> branches can give us both reliability/reproducibility
>> (stable branch) as well as new features (unstable
>> branch). The user gets to pick (and you can pick
>> both!). The same is true for r-base: when using a
>> 'released' version you get 'stable' base packages that
>> are up to 12 months old. If you want to have the latest
>> stuff you download a nightly build of r-devel. For
>> regular users and reproducible research it is recommended
>> to use the stable branch. However if you are a developer
>> (e.g. package author) you might want to
>> develop/test/check your work with the latest r-devel.
>>
>> I think that extending the R release cycle to CRAN would
>> result both in more stable released versions of R, as
>> well as more freedom for package authors to implement
>> rigorous change in the unstable branch. When writing a
>> script that is part of a production pipeline, or sweave
>> paper that should be reproducible 10 years from now, or a
>> book on using R, you use stable version of R, which is
>> guaranteed to behave the same over time. However when
>> developing packages that should be compatible with the
>> upcoming release of R, you use r-devel which has the
>> latest versions of other CRAN and base packages.
>>
>>
>>
>> As I remember ... The example demonstrating the need for
>> this was an XML package that cause an extract from a
>> website where the headers were misinterpreted as data in
>> one version of pkg:XML and not in another. That seems
>> fairly unconvincing. Data cleaning and validation is a
>> basic task of data analysis. It also seems excessive to
>> assert that it is the responsibility of CRAN to maintain
>> a synced binary archive that will be available in ten
>> years.
>>
>>
>>
>> CRAN already does this, the bin/windows/contrib directory
>> has subdirectories going back to 1.7, with packages dated
>> October 2004. I don't see why it is burdensome to
>> continue to archive these. It would be nice if source
>> versions had a similar archive.
>>
>>
>> The bin/windows/contrib directories are updated every day
>> for active R versions. It's only when Uwe decides that a
>> version is no longer worth active support that he stops
>> doing updates, and it "freezes". A consequence of this
>> is that the snapshots preserved in those older
>> directories are unlikely to match what someone who keeps
>> up to date with R releases is using. Their purpose is to
>> make sure that those older versions aren't completely
>> useless, but they aren't what Jeroen was asking for.
>>
>>
>> But it is almost completely useless from a
>> reproducibility point of view to get random package
>> versions. For example if some people try to use R-2.13.2
>> today to reproduce an analysis that was published 2 years
>> ago, they'll get Matrix 1.0-4 on Windows, Matrix 1.0-3 on
>> Mac, and Matrix 1.1-2-2 on Unix. And none of them of
>> course is what was used by the authors of the paper (they
>> used Matrix 1.0-1, which is what was current when they
>> ran their analysis).
>>
>> Initially this discussion brought back nightmares of DLL
>> hell on Windows. Those as ancient as I will remember
>> that well. But now, the focus seems to be on
>> reproducibility, but with what strikes me as a seriously
>> flawed notion of what reproducibility means.
>>
>> Herve Pages mentions the risk of irreproducibility across
>> three minor revisions of version 1.0 of Matrix.
> If you use R-2.13.2, you get Matrix 1.1-2-2 on
> Linux.
No way! Matrix 1.1-2-2 has Depends: R (>= 2.15.2)
> AFAIK this is the most recent version of Matrix,
> aimed to be compatible with the most current version of R
> (i.e. R 3.0.3). However, it has never been tested with R-2.13.2.
Exactly. And for this reason, I have adopted to keep
Depends: R (>= ...)
in Matrix and partly, in other packages I maintain.
Doing so does prevent users of old versions of R to get new
features, and even more importantly, get the latest (few, of
course ! ;-) bug-fixes for Matrix.
But apart from this short note.
I'm very sympathetic with optionally providing easier (not
"easy") ways of setting up old versions of R and packages,
where users can pretty quickly use the printed (unfortunately,
for now) output of sessionInfo(), to reinstall
1) the version of R
2) an install.packages() call which tries (!) to get
the corresponding packages (in their correct version) from
CRAN (including ./Archive/ !)..
similarly to what Duncan Murdoch has agreed to.
> I'm not saying that it should, that would be a
> big waste of resources of course. All I'm saying it that
> it doesn't make sense to serve by default a version that
> is known to be incompatible with the version of R being
> used. It's very likely to not even install properly.
[..............]
> Also note that back in October 2011, people using R-2.13.2
> would get e.g. ape 2.7-3 on Linux, Windows and
> Mac. Wouldn't it make sense that people using R-2.13.2
> today get the same? Why would anybody use R-2.13.2 today
> if it's not to run again some code that was written and
> used two years ago to obtain some important results?
I also tend to agree that it would be great if someone (Karl
Millar -> Google ?) would setup a good time-stamping system for
CRAN {and Bioconductor and Omegahat and ..?} packages.
Ideally that system would work by *using* the CRAN (and ..)
infrastructure.
> Cheers, H.
I'm still unsure if I should agree with you (Hervé) that some
freezing / "data base of package timestamps" should
happen on-CRAN in addition.
Martin
More information about the R-devel
mailing list