[Rd] [RFC] A case for freezing CRAN

Rainer M Krug Rainer at krugs.de
Fri Mar 21 11:08:46 CET 2014


Jari Oksanen <jari.oksanen at oulu.fi> writes:

> On 21/03/2014, at 10:40 AM, Rainer M Krug wrote:
>
>> 
>> 
>> This is a long and (mainly) interesting discussion, which is fanning out
>> in many different directions, and I think many are not that relevant to
>> the OP's suggestion. 
>> 
>> I see the advantages of having such a dynamic CRAN, but also of having a
>> more stable CRAN. I prefer CRAN as it is now, but ion many cases a more
>> stable CRAN might b an advantage. So having releases of CRAN might make
>> sense. But then there is the archiving issue of CRAN.
>> 
>> The suggestion was made to move the responsibility away from CRAN and
>> the R infrastructure to the user / researcher to guarantee that the
>> results can be re-run years later. It would be nice to have this build
>> in CRAN, but let's stick at the scenario that the user should care for
>> reproducability.
>
> There are two different problems that alternate in the discussion:
> reproducibility and breakage of CRAN dependencies. Frozen CRAN could
> make *approximate* reproducibility easier to achieve, but real
> reproducibility needs stricter solutions. Actual sessionInfo() is
> minimal information, but re-building a spitting image of old
> environment may still be demanding (but in many cases this does not
> matter).
>
> Another problem is that CRAN is so volatile that new versions of
> packages break other packages or old scripts. Here the main problem is
> how package developers work. Freezing CRAN would not change that: if
> package maintainers release breaking code, that would be frozen. I
> think that most packages do not make distinction between development
> and release branches, and CRAN policy won't change that.
>
> I can sympathize with package maintainers having 150 reverse
> dependencies. My main package only has ~50, and it is sure that I
> won't test them all with new release. I sometimes tried, but I could
> not even get all those built because they had other dependencies on
> packages that failed. Even those that I could test failed to detect
> problems (in one case all examples were \dontrun and passed nicely
> tests). I only wish that if people *really* depend on my package, they
> test it against R-Forge version and alert me before CRAN releases, but
> that is not very likely (I guess many dependencies are not *really*
> necessary, but only concern marginal features of the package, but CRAN
> forces to declare those).

Breakage of CRAN packages is a problem, to which I can not comment
much. I have no idea how this could be saved unless one introduces more
checks, which nobody wants. CRAN is a (more or less) open repository for
packages written by engineers / programmers but also scientists of other
fields - and that is the strength of CRAN - a central repository to find
packages which conform to a minimal standard and format. 

>
> Still a few words about reproducibility of scripts: this can be hardly
> achieved with good coverage, because many scripts are so very ad
> hoc. When I edit and review manuscripts for journals, I very often get
> Sweave or knitr scripts that "just work", where "just" means "just so
> and so". Often they do not work at all, because they had some
> undeclared private functionalities or stray files in the author
> workspace that did not travel with the Sweave document. 

One reason why I *always* start my R sessions --vanilla and ave a local
initialization script which I call manually. 

> I think these
> -- published scientific papers -- are the main field where the code
> really should be reproducible, but they often are the hardest to
> reproduce. 

And this is completely ouyt of the hands of R / CRAN / ... and in the
hand of Journals and Authors. But R could provide a framework to make
this more easy in form of a package which provides functions to make
this a one-command approach.

> Nothing CRAN people do can help with sloppy code scientists
> write for publications. You know, they are scientists -- not
> engineers.

Absolutely - and I am also a sloppy scientists - I put my code online,
but hope that not many people ask me later about it.

Cheers,

Rainer

>
> Cheers, Jari Oksanen
>> 
>> Leaving the issue of compilation out, a package which is creating a
>> custom installation of the R version which includes the source of the R
>> version used and the sources of the packages in a on Linux compilable
>> format, given that the relevant dependencies are installed, would be a
>> huge step forward. 
>> 
>> I know - compilation on Windows (and sometimes Mac) is a serious
>> problem), but to archive *all* binaries and to re-compile all older
>> versions of R and all packages would be an impossible task.
>> 
>> Apart from that - doing your analysis in a Virtual Machine and then
>> simply archiving this Virtual Machine, would also be an option, but only
>> for the more tech savy users.
>> 
>> In a nutshell: I think a package would be able to provide the solution
>> for a local archiving to make it possible to re-run the simulation with
>> the same tools at a later stage - although guarantees would not be
>> possible.
>> 
>> Cheers,
>> 
>> Rainer
>> -- 
>> Rainer M. Krug
>> email: Rainer<at>krugs<dot>de
>> PGP: 0x0F52F982
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Rainer M. Krug
email: Rainer<at>krugs<dot>de
PGP: 0x0F52F982
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 494 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140321/dc83421f/attachment.bin>


More information about the R-devel mailing list